skwdro.torch module#

API

This is the main dish of the API: use this interface first before trying the more complicated ones. See the pytorch interface tuto to learn more to learn more.

skwdro.torch.robustify(loss_: Module | Callable[[...], Tensor], transform_: Module | None, rho: Tensor, xi_batchinit: Tensor, xi_labels_batchinit: Tensor | None, post_sample: bool = True, cost_spec: str | None = None, n_samples: int = 10, seed: int = 42, *, reduction: str | None = None, learning_rate: float | None = None, epsilon: float | None = None, sigma: float | None = None, l2reg: float | None = None, adapt: str | None = 'prodigy', n_iter: int | Tuple[int, int] | None = None, imp_samp: bool = True, loss_reduces_spatial_dims: bool = False) → _DualFormulation#

Provide the wrapped version of the primal loss.

Parameters:

loss_: nn.Module|Callable: the primal loss \(L_\theta\). Can be given either as a torch.nn.Module or as a (functional) callable.
transform_: nn.Module|None: the transformation to apply to the (non-label) data before feeding it to the loss. Identity if set to None (default).
rho: Tensor, scalar tensor: Wasserstein radius
xi_batchinit: Tensor, shape (n_samples, n_features): Data points to initialize the samplers and \(\lambda_0\)
xi_labels_batchinit: Optional[Tensor], shape (n_samples, n_features): Labels to initialize the samplers and \(\lambda_0\)
post_sample: bool: whether to use a post-sampled dual loss
cost_spec: str|None: the cost specification in the format (k, p) for a sample k-norm and p-power. None to use the default (2, 2).
n_samples: int: number of \(\zeta\) samples to draw before the gradient descent begins (can be changed if needed between inferences)
seed: int: the seed for the samplers
reduction: str | None: specifies the reduction to apply to the outer expectation of the SkWDRO formula applied: 'none' | 'mean' | 'sum'. - 'none': no reduction will be applied, - 'mean': the sum of the output will be divided by the number of elements in the output, - 'sum': the output will be summed. Default: None which translates to 'mean'
learning_rate: float: the step size for the default descent algorithm linked to the loss function
epsilon: float|None: Epsilon if hard coded, None to let the algo find it.
sigma: float|None: Sigma if hard coded, None to let the algo find it.
l2reg: float|None: L2 regularization if needed
adapt: str|None: the adaptative step to use between “prodigy” and “mechanic”.
n_iter: int|tuple[int, int]|None: can set the default number of iterations if used through the default solving routines. Mostly an internal parameter. If int, it is the number of internal robust optimization steps, if a 2-uple of ints, it is the number of erm steps preceding the robust solve then the number of robust steps, if None it will be filled by default.
imp_samp: bool: whether to use importance sampling (will work only for (2, 2) costs).
loss_reduces_spatial_dims: bool: flag that can be set to True if the primal loss reduces the last dimension of the losses batch with its reduction set to 'none', e.g. for torch.CrossEntropyLoss which will take one dimension as channel axis, defaults to False