In this talk, we will present a higher-order optimization framework grounded on the Optimal Control (OC) principle for training Neural ODEs. We show that a specific continuous-time OC methodology, called Differential Programming, can be adopted to derive backward ODEs for higher-order derivatives at the same O(1) memory. This, together with a low-rank representation and Kronecker factorization, leads to an efficient second-order method that converges much faster than first-order optimizers in wall-clock time, yet without hindering test-time performance. The improvement remains consistent across various applications, eg image classification, generative flow, and time-series prediction. Our framework also enables direct architecture optimization, such as the integration time of Neural ODEs, with second-order feedback policies, strengthening the OC perspective as a principled tool of analyzing optimization in deep models.
Guan-Horng is an ML PhD in Georgia Institute of Technology, in the Autonomous Control and Decision Systems Laboratory.
To become a member of the Rough Path Interest Group, register here for free.