We study the ability of large language models to perform next-token prediction with their ability to perform previous-token prediction. From an information-theoretic point of view, both correspond to two factorings of the same joint measure. However in practice, we see something surprising: for any known language and any sufficiently large language model, we see a slight asymmetry between the performance of next-token prediction versus previous-token prediction. We will discuss this arrow of time phenomenon, explain its origin in terms of computational complexity, and outline perspectives open by our work.
Our speaker
Since 2014, Clément Hongler leads the Chair of Statistical Field Theory at EPFL. He is known for works in statistical mechanics, quantum field theory, and the theory of neural networks.
To become a member of the Rough Path Interest Group, register here for free.