Date(s) : 01/03/2019 iCal
14 h 00 min - 15 h 00 min
A couple of years ago, adaptive algorithms such as ADAM, RMSPROP, AMSGRAD, ADAGRAD became the default method of choice for training machine learning models. Practitioners commonly observed that the value of the training loss decays faster than for stochastic gradient descent, but the inherent reason is still not understood. A motivation of our work was to understand what properties make them so well suited for deep learning. In this talk, I will analyze adaptive algorithms by studying their continuous time counterpart.
I will first explain the connection between the optimization algorithms and the continuous differential equations. Then, I will give sufficient conditions to guarantee convergence of trajectories towards a critical value and will discuss some properties of adaptive algorithms.
This is joint work with A. Belotto Da Silva.
Catégories Pas de Catégories