Lectures on

Program
Thursday, October, 21st
Aurélien Garivier  This course will introduce the notion of concentration of measure and highlight its applications, notably in high dimensional data processing and machine learning.
The course will start from deviations inequalities for averages of independent variables, and illustrate their interest for the analysis of random graphs and random projections
for dimension reduction. It will then be shown how other highdimensional random functions concentrate, and what guarantees this concentration yields for randomized algorithms and
machine learning procedures to learn from large training collections. [Video of the morning session] (restricted access) [Video of the afternoon session] (restricted access)  
8:45  Welcome  
9.0010:50  Lecture  
10:5011:10  Break  
11.1012:15  Tutorial [Exercices]  
12:1514:00  Lunch  
14:0015:00  Lecture  
15:0015:15  Break  
15:1516:15  Tutorial  
16:1516:30  Break  
16:3017:30  Lecture 
Friday, October, 22nd
9:3010:00  Coffee and introduction  
10.0010.45  Agnès Desolneux  A Wassersteintype distance in the space of Gaussian mixture models Abstract. [Slides] [Video of the talk] (restricted access) 
Abstract
In this talk, we introduce a Wassersteintype distance on the set of Gaussian mixture models. This distance is defined by restricting the set of possible coupling measures in the optimal transport problem to Gaussian mixture models. We derive a very simple discrete formulation for this distance, which makes it suitable for high dimensional problems. We also study the corresponding multimarginal and barycenter formulations. We show some properties of this Wassersteintype distance, and we illustrate its practical use with some examples in image processing. This is a joint work with Julie Delon. 

10.4511:15  FrançoisXavier Vialard  Breaking the Curse of Dimension in Smooth Optimal Transport Estimation [Slides] 
Abstract
TBD 

11:1511:45  Break  
11:4512:15  Elsa Cazelles  A novel notion of barycenter for probability distributions based on optimal weak mass transport Abstract. [Slides] [Video of the talk] (restricted access) 
Abstract
We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass. We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging in a geometric way the input distributions, as the Wasserstein barycenter based on classic optimal transport does, weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. 

12:1514:00  Lunch  
14:0014:45  Lénaïc Chizat  Analysis of gradient descent on wide twolayer ReLU neural networks Abstract. [Slides] [Video of the talk] (restricted access) 
Abstract
In this talk, we propose an analysis of gradient descent on wide twolayer ReLU neural networks that leads to sharp characterizations of the learned predictor. The main idea is to study the training dynamics when the width of the hidden layer goes to infinity, which is a Wasserstein gradient flow. While this dynamics evolves on a nonconvex landscape, we show that for appropriate initializations, its limit, when it exists, is a global minimizer. We also study the implicit regularization of this algorithm when the objective is the unregularized logistic loss, which leads to a maxmargin classifier in a certain functional space. We finally discuss what these results tell us about the generalization performance, and in particular how these models compare to kernel methods. 

14:4515:15  Geneviève Robin  Outliers detection in networks with missing links Abstract. [Slides] [Video of the talk] (restricted access) 
Abstract
Networks are powerful to analyse complex systems: agents are represented as nodes, and pairwise interactions between agents are recorded as edges between these nodes. An important field of application is sociology, where the recent development of online social networks offers unprecedented possibilities. In this example, estimating the underlying probabilities of interactions between agents is a key problem. However, in many examples existing models fail to describe networks containing a small number of outlier nodes with abnormal behaviour. For example, fake accounts created to spread fake news. In addition, reallife networks are likely to be incompletely observed, with missing links due to individual nonresponse or machine failures. Identifying outliers in the presence of missing links is therefore a crucial problem. In this work, we introduce a new algorithm to detect outliers in a network that simultaneously predicts the missing links. Under fairly general assumptions, our algorithm exactly detects the outliers, and achieves the best known error for the prediction of missing links with polynomial computation cost. We also illustrate the method numerically and present to applications in epidemiology and social network analysis. 

15:1515:45  Break  
15:4516:15  Nicolas Keriven  Graph neural networks on large random graphs: convergence, stability, universality Abstract. [Slides] [Video of the talk] (restricted access) 
Abstract
In this talk, we will discuss some theoretical properties of Graph Neural Networks (GNNs) on large graphs. Indeed, most existing analyses of GNNs are purely combinatorial and may fail to reflect their behavior regarding largescale structures in graphs, such as communities of well connected nodes or manifold structures. To address this, we assume that the graphs of interest are generated with classical models of random graphs. We first give nonasymptotic convergence bounds of GNNs toward “continuous” equivalents as the number of nodes grows. We then study their stability to small deformations of the underlying random graph model, a crucial property in traditional CNNs. Finally, we study their universality and approximation power, and show how some recent GNNs are more powerful than others. 

16:1516:45  Olga Mula  Model order reduction by interpolation in Wasserstein spaces Abstract. [Video of the talk] (restricted access) 
Abstract
TBD 

16:4517:00  Conclusions 