The RL Probabilist A blog by Dibya Ghosh on RL, ML, and probability.

Training smol language models

A quick afternoon into looking at the intrinsic dimensionality of language models, LoRA pre-training, and fourier transforms that make my brain hurt.

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

A lot of empirical evidence has shown that generalization in RL is hard in practice, but is this an issue with our implementations or something more fundamental? This blog post explores one reason why generalization in RL is fundamentally hard: it turns fully-observed RL problems into more challenging partially-observed ones.

Learning to Reach Goals via Iterated Supervised Learning

This post provides a simple introduction to my recent paper and the algorithm we propose: Goal-conditioned Supervised Learning (GCSL)

Trouble in High-Dimensional Land

Most of the intuitions we build in 2D and 3D break in higher dimensions, a core problem for most machine learning problems. So where do they break?

KL Divergence for Machine Learning

A writeup introducing KL divergence in the context of machine learning, various properties, and an interpretation of reinforcement learning and machine learning as minimizing KL divergence