Songlin Yang

Abstract

Linear recurrent models are gaining significant attention for their efficiency in large-scale training compared to nonlinear counterparts. Notable examples include Mamba, RWKV, GLA, and xLSTM. In this talk, I will introduce DeltaNet, a linear RNN that is strictly more expressive while retaining hardware-efficient training properties. I will motivate DeltaNet from an in-context learning perspective and outline strategies for scaling its training effectively. The talk will also explore DeltaNet's connections to recent advances such as TTT and Titans, along with emerging extensions such as Gated DeltaNet, RWKV7, DeltaProduct, and Mixture of Memory (MoM).

Our speaker

Songlin is a second-year Ph.D. student at MIT CSAIL, advised by Prof. Yoon Kim. Her research focuses on hardware-aware algorithms for efficient sequence modeling, with contributions particularly in linear attention models. She is the lead contributor to the Flash Linear Attention library and the initiator of the Advances in Sequence Modeling from Algorithmic Perspectives (ASAP) seminar.

To become a member of the Rough Path Interest Group, register here for free.

Songlin Yang

Advances in Scalable Linear RNNs: DeltaNet and Its Variants

Abstract

Our speaker