skwdro package#

Subpackages#

Submodules#

skwdro.torch module#

skwdro.wrap_problem module#

skwdro.wrap_problem.decide_on_impsamp(user_query: bool, cost: ParsedCost) bool[source]#
skwdro.wrap_problem.dualize_primal_loss(loss_: Module | Callable[[...], Tensor], transform_: Module | None, rho: Tensor, xi_batchinit: Tensor, xi_labels_batchinit: Tensor | None, post_sample: bool = True, cost_spec: str | None = None, n_samples: int = 10, seed: int = 42, *, reduction: str | None = None, learning_rate: float | None = None, epsilon: float | None = None, sigma: float | None = None, l2reg: float | None = None, adapt: str | None = 'prodigy', n_iter: int | Tuple[int, int] | None = None, imp_samp: bool = True, loss_reduces_spatial_dims: bool = False) _DualFormulation[source]#

Provide the wrapped version of the primal loss.

Parameters:
loss_: nn.Module|Callable

the primal loss \(L_\theta\). Can be given either as a torch.nn.Module or as a (functional) callable.

transform_: nn.Module|None

the transformation to apply to the (non-label) data before feeding it to the loss. Identity if set to None (default).

rho: Tensor, scalar tensor

Wasserstein radius

xi_batchinit: Tensor, shape (n_samples, n_features)

Data points to initialize the samplers and \(\lambda_0\)

xi_labels_batchinit: Optional[Tensor], shape (n_samples, n_features)

Labels to initialize the samplers and \(\lambda_0\)

post_sample: bool

whether to use a post-sampled dual loss

cost_spec: str|None

the cost specification in the format (k, p) for a sample k-norm and p-power. None to use the default (2, 2).

n_samples: int

number of \(\zeta\) samples to draw before the gradient descent begins (can be changed if needed between inferences)

seed: int

the seed for the samplers

reduction: str | None

specifies the reduction to apply to the outer expectation of the SkWDRO formula applied: 'none' | 'mean' | 'sum'. - 'none': no reduction will be applied, - 'mean': the sum of the output will be divided by the number of elements in the output, - 'sum': the output will be summed. Default: None which translates to 'mean'

learning_rate: float

the step size for the default descent algorithm linked to the loss function

epsilon: float|None

Epsilon if hard coded, None to let the algo find it.

sigma: float|None

Sigma if hard coded, None to let the algo find it.

l2reg: float|None

L2 regularization if needed

adapt: str|None

the adaptative step to use between “prodigy” and “mechanic”.

n_iter: int|tuple[int, int]|None

can set the default number of iterations if used through the default solving routines. Mostly an internal parameter. If int, it is the number of internal robust optimization steps, if a 2-uple of ints, it is the number of erm steps preceding the robust solve then the number of robust steps, if None it will be filled by default.

imp_samp: bool

whether to use importance sampling (will work only for (2, 2) costs).

loss_reduces_spatial_dims: bool

flag that can be set to True if the primal loss reduces the last dimension of the losses batch with its reduction set to 'none', e.g. for torch.CrossEntropyLoss which will take one dimension as channel axis, defaults to False

skwdro.wrap_problem.expert_hyperparams(rho: Tensor, p: float, epsilon: float | None, epsilon_sigma_factor: float, sigma: float | None, sigma_factor: float) Tuple[Tensor, Tensor][source]#

Tuning of the hyperparameters for the dual loss.

Parameters:
rho: Tensor, shape (n_samples,)

Wasserstein radius

p: float

power of norm

epsilon: float

Epsilon if hard coded, None to let the algo find it.

epsilon_sigma_factor: float

Estimated ratio \(\frac{\epsilon}{\sigma}\)

sigma: float

Sigma if hard coded, None to let the algo find it.

sigma_factor: float

Estimated ratio \(\frac{\sigma}{\rho}\)

skwdro.wrap_problem.power_from_parsed_spec(parsed_spec: ParsedCost | None) float[source]#

Module contents#