Large language models are increasingly deployed in high-stakes settings, yet their reliability is undermined by hallucinations and hard-to-predict reasoning failures. Unlike many areas of traditional AI with established guarantees and debugging practices, little of this sort exists for LLMs. I will present recent work from my lab — and a broader program — aimed at closing this gap. First, I will show how properties of the Transformer architecture constrain model abilities and help explain specific failure modes; these insights point toward strategies for improving reliability. Second, I will describe methods for probing and interpreting internal computations, providing a foundation for diagnosing failures. I will conclude with implications for building LLMs that reason more reliably and for the future of NLP.