ResNets of All Shapes and Sizes: Quantitative Large-Scale Theory of Training Dynamics

Louis-Pierre CHAINTRON
École Polytechnique Fédérale de Lausanne

Date(s) : 14/04/2026 iCal
14h30 - 15h30

We study the convergence of the training dynamics
of residual neural networks (ResNets)
towards their joint infinite depth–width limit.
We focus on ResNets with two-layer perceptron blocks,
whose shape is determined by the depth L,
hidden width M, and embedding dimension D,
and we adopt the residual scaling O(√D/√(LM))
recently identified as necessary for local feature learning.
We show that after a bounded number of training steps,
the error between the finite ResNet and its infinite-size limit
is O(1/L + √D/√(LM) + 1/√D), and numerical experiments suggest
that this bound is tight in the early training phase.
From a probabilistic viewpoint,
the D → ∞ limit amounts to a mean-field limit
over the coordinates of the embedding,
where some interactions scale in 1/√D
contrary to the usual 1/D setting .
Our analysis is a rigorous and quantitative instance
of the Dynamical Mean Field Theory (DMFT)
from statistical physics;
it combines propagation of chaos arguments
with the cavity method at a functional level.