Journée réglage des hyperparamètres
25 octobre 2012


Program

October, 25th  

09.00 Coffee Welcome
09.45 Gaëlle Loosli Regularization paths for nu-SVM and nu-SVR
Abstract
The utilization of SVM by neophyte users is hampered by the need to supply values for control parameters in order to get the best attainable results. Mainly, given clean data, SVM's users must make three choices: the type of kernel, its bandwidth and the regularization parameter. Given the importance of this problem for reaping all the potential benefits of the use of SVM, many research work have been dedicated to ways of helping the setting of the parameters. Most rely on either outer measures, such as cross-validation, to guide the selection, or to measures embedded in the learning method itself. In place of empirical approaches to the setting of the control parameters, regularization paths have been proposed and provide a smart and fast way to access all the optimal solutions of a problem according to all compromises between bias and variance for regression or compromises between bias and regularity in classification. However, having the whole regularization path is not enough. Indeed, the end user still needs to retrieve from it the best values for the regularization parameters. Instead of selecting these values by k-fold cross-validation or leave-one-out, or other approximations, we propose to include the leave-one-out estimator inside the regularization path in order to have an idea of the generalization error at each step.
10.25 Rémi Bonidal. Hyper-parameter selection for M-SVM2 through regularization path
Abstract
For a support vector machine (SVM), model selection amounts to the selection of the hyper-parameters which are the kernel function, the values of its parameters, and the amount of regularization. To set the value of the regularization parameter, one can mini- mize an appropriate objective function over the regularization path. A priori, this requires the availability of two elements: the objective function, and an algorithm fitting the entire regularization path at a reduced cost. As for multi-category classification, the literature provides us with an upper bound on the leave-one-out cross-validation error for the M- SVM2. However, no algorithm was available so far for fitting the entire regularization path of this machine. In this presentation, we extend to the M-SVM2 some results available for the l2-SVM. We derive a method for selecting the value of the regularization parameter that integrate a regularization path algorithm and a model selection criterion. Then, we present three new criteria that are related to the leave-one-out cross-validation error and that share similarities with the well-known Span bound. The performance of the new criteria and the radius-margin bound in terms of model selection are close. Furthermore, the path- following algorithm proves to be computationally interesting when the Gram matrix can be approximated by a low rank matrix.
11.05 Coffee break
11.25 Pierre Machart. Optimal Computational Trade-Off of Inexact Proximal Methods
Abstract
Recent advances in machine learning and signal processing have led to more involved optimisation problems, while abundance of data calls for more efficient optimization algorithms. First-order methods are now extensively employed to tackle these issues and, among them, proximal-gradient algorithms are becoming increasingly popular. The heart of these procedures is the proximity operator. In the favorable cases, analytical forms exist. However, there are many problems where the proximity operator can only be computed numerically, giving rise to what can be referred to as inexact proximal-gradient algorithms. With those algorithms, one needs to set: a) the number of iterations of the precedure, b) the precision in the appoximation of the proximity operator, at each iteration. These quantities, which can be seen as hyper-parameters, are the object of study of this talk. Expressing the computational cost of inexact proximal-gradient algorithms, we derive a computationally optimal strategy to set those hyper-parameters.
12.05 Lunch All participants are invited
13.50 Caroline Chaux ML estimation of hyperparameters in inverse problems with wavelet regularization
Abstract
We are interested in hyperparameter estimation for restoring images degraded by a blur and an additive white Gaussian noise. More precisely, we adopt a variational approach and aim at minimizing a convex criterion that is composed of i) one data fidelity term (quadratic) and ii) one regularization term (l1-norm for example) penalizing wavelet domain coefficients. We propose to estimate the regularization parameters (one per subband) by resorting to a maximum likelihood approach, considering that no reference data are available (incomplete case).The main difficulty is to be able to sample according to a priori and a posteriori distributions because of pixel interactions introduced by the blur operator. The proposed method allows to compute sampling with MCMC (Gibbs sampling and Metropolis-Hastings) and then a gradient method is used to estimate the hyperparameters. Unfortunately, the step-size of the gradient descent is hard to determine and clearly plays a prominent role in the algorithm convergence speed. Consequently, we additionally compute an adaptive step length based on Barzilai-Borwein method coupled to a line search strategy. The good performance and behavior of the proposed approach are demonstrated through simulation results.
Joint work with Laure Blanc-Féraud, Roberto Cavicchioli, Luca Zanni CNRS, Morpheme Research Group, I3S/INRIA Sophia Antipolis, France Modena University, Italy, Modena University, Italy
14.30 Coffee break
14.40 Jean-Patrick Baudry, Bertrand Michel Slope Heuristics: Practical and Theoretical Aspects
Abstract

Model selection is a general paradigm which includes many statistical problems. One of the most fruitful and popular approaches to carry it out is the minimization of a penalized criterion. Birgé and Massart [2] have proposed a promising data-driven method to calibrate such criteria whose penalties are known up to a multiplicative factor: the "slope heuristics". Theoretical works validate this heuristic method in some situations and several papers report a promising practical behavior in various frameworks. In this double-talk we first introduce this approach, the practical difficulties which occur while applying it and the solutions we proposed to work them out in [1]. We will also present the (R and Matlab) packages which implement these solutions. Secondly, we will explain further why such a data-driven approach is necessary to get the most out of some results arising from the model selection paradigm. We will also present an overview of the available theoretical results about the approach and develop a few of them.
[1] Jean-Patrick Baudry, Cathy Maugis, and Bertrand Michel. Slope heuristics: overview and implementation. Stat. Comput., 22, pp 455 - 470, 2012.
[2] Lucien Birgé and Pascal Massart. Minimal penalties for Gaussian model selection. Probab. Theory Related Fields, 138, pp 33 - 73, 2007.
15.50 Coffee break
16.10 Charles Deledalle Proximal Splitting Derivatives for Risk Estimation
Abstract
We develop a novel framework to compute a projected Generalized Stein Unbiased Risk Estimator (GSURE) for a wide class of sparsely regularized solutions of inverse problems. This class includes arbitrary convex data fidelities with both analysis and synthesis mixed L1-L2 norms. The GSURE necessitates to compute the (weak) derivative of a solution w.r.t. the observations. However, as the solution is not available in analytical form but rather through iterative schemes such as proximal splitting, we propose to iteratively compute the GSURE by differentiating the sequence of iterates. This provides us with a sequence of differential mappings, which, hopefully, converges to the desired derivative and allows to compute the GSURE. We illustrate this approach to automatically select the regularization parameter on different variational problems.