As the width of Deep Neural Networks (DNNs) tends to infinity, recent works have shown that the training dynamics of (infinite-width) DNNs under gradient flow is captured by a constant kernel called the Neural Tangent Kernel (NTK). In this talk, we introduce a family of NTKs related to recurrent neural networks, which we dub the Neural Tangent Kernel (RNTK). We discuss the insights that RNTK provides into the behavior of over-parametrized RNNs, including how different time steps are weighted by the RNTK to form the output under different initialization parameters and nonlinearity choices, and how inputs of different lengths are treated. We demonstrate that the RNTK offers significant performance gains over other kernels, including standard NTKs, across a wide array of time-series and non-time-series datasets.
Sina is a second year PhD candidate in the Electrical and Computer Engineering Department at Rice University, advised by Prof. Richard Baraniuk. His current research is deep learning theory. For more information see his webpage.