What is WDRO?

Wasserstein Distributionally Robust Optimization (WDRO) is a mathematical program that can provide robustness to data shifts in machine learning models.

Machine Learning models

Let us denote the cost f_\theta(\xi) of a prediction parametrized by \theta for some uncertain variable \xi. For instance, in linear regression, we have \xi=(x,y)\in\mathbb{R}^d\times\mathbb{R} with x the data and y the label. Then, f_\theta(\xi) = \frac{1}{2} ( \langle \theta , x \rangle - y )^2.

In machine learning, it is usual to train our model (or fit, ie. optimize on \theta) using data samples (\xi_i)_{i=1}^n of the uncertain parameter by minimizing the Empirical Risk, which leads to the problem:

(1)\min_{\theta} \frac{1}{n} \sum_{i=1}^n  f_\theta(\xi_i)

Equation (1) is usually called Empirical Risk Minimization (ERM) in the literature.

Robustness

WDRO