What is WDRO?

Wasserstein Distributionally Robust Optimization (WDRO) is a mathematical program that can provide robustness to data shifts in machine learning models.

Machine Learning models

Let us denote the cost $f_\theta(\xi)$ of a prediction parametrized by $\theta$ for some uncertain variable $\xi$ . For instance, in linear regression, we have $\xi=(x,y)\in\mathbb{R}^d\times\mathbb{R}$ with $x$ the data and $y$ the label. Then, $f_\theta(\xi) = \frac{1}{2} ( \langle \theta , x \rangle - y )^2$ .

In machine learning, it is usual to train our model (or fit, ie. optimize on $\theta$ ) using data samples $(\xi_i)_{i=1}^n$ of the uncertain parameter by minimizing the Empirical Risk, which leads to the problem:

(1) $\min_{\theta} \frac{1}{n} \sum_{i=1}^n f_\theta(\xi_i)$

Equation (1) is usually called Empirical Risk Minimization (ERM) in the literature.

Robustness

WDRO