Surrogate kernel-based methods offer a flexible solution to structured output prediction by leveraging the kernel trick in both input and output spaces. In contrast to energy-based models, they avoid paying the cost of inference during training, while enjoying statistical guarantees. However, without approximation, these approaches are condemned to be used only on a limited amount of training data. In this paper, we propose to equip surrogate kernel methods with approximations based on sketching, seen as low-rank projections of feature maps both on input and output feature maps. We showcase the approach on Input Output Kernel ridge Regression (or Kernel Dependency Estimation) and provide excess risk bounds that can be in turn directly plugged into the final predictive model. An analysis of the complexity in time and memory shows that sketching the input kernel mostly reduces training time while sketching the output kernel allows to reduce the inference time. Furthermore, we show that Gaussian and sub-Gaussian sketches are admissible sketches in the sense that they induce projection operators ensuring a small excess risk. Experiments on different tasks consolidate our findings.
I am a third-year PhD student in the Signal, Statistics and Learning (S2A) team in the Image, Data and Signal (IDS) Department of the Laboratory Treatment and Communication of Information (LTCI) at Télécom Paris, Palaiseau. My PhD research is conducted under the supervision of Florence d'Alché-Buc (Télécom Paris) and Pierre Laforgue (LAILA team, Università degli Studi di Milano). My work focuses on leveraging random projections to tackle scalability to large datasets of kernel methods, for either scalar regression, vector-valued regression, and even structured prediction. Previously, I jointly graduated from Mines Saint-Etienne and Master M2 MVA at ENS Paris-Saclay.
To become a member of the Rough Path Interest Group, register here for free.