Language models (LMs) have become central to natural language processing, but our understanding of how they operate is still incomplete. In this talk, I will discuss three aspects that are crucial to completing this picture. First, I will talk about tokenisation—how text is split into tokens before being fed to an LM—and its theoretical and empirical consequences. This will cover the NP-completeness of finding optimal tokenisers, methods for recovering word-level probabilities from token-level models, and recent work on estimating the bias introduced by tokenisers. Second, I will turn to optimisation, investigating the dynamics of memorisation and convergence throughout an LM’s training. Third, I will consider inference, examining the potential and limitations of causal abstraction as a tool for mechanistic interpretability. Finally, I will close the talk by connecting these perspectives, shedding light on how language models see, learn, and process language.