A general system of differential equations to model first order adaptive algorithms. Application to ADAM.

Carte non disponible

Date(s) - 01/03/2019
14 h 00 min - 15 h 00 min

Catégories Pas de Catégories

A couple of years ago, adaptive algorithms such as ADAM, RMSPROP, AMSGRAD, ADAGRAD became the default method of choice for training machine learning models. Practitioners commonly observed that the value of the training loss decays faster than for stochastic gradient descent, but the inherent reason is still not understood. A motivation of our work was to understand what properties make them so well suited for deep learning. In this talk, I will analyze adaptive algorithms by studying their continuous time counterpart.
I will first explain the connection between the optimization algorithms and the continuous differential equations. Then, I will give sufficient conditions to guarantee convergence of trajectories towards a critical value and will discuss some properties of adaptive algorithms.
This is joint work with A. Belotto Da Silva.


Posts created 14

Articles similaires

Commencez à saisir votre recherche ci-dessus et pressez Entrée pour rechercher. ESC pour annuler.

Retour en haut
Secured By miniOrange