skwdro.linear_models package

Module contents

class skwdro.linear_models.LinearRegression(rho=0.01, l2_reg=0.0, fit_intercept=True, cost='t-NLC-2-2', solver='entropic_torch', solver_reg=None, sampler_reg=None, n_zeta_samples: int = 10, random_state: int = 0, opt_cond: ~skwdro.solvers.optim_cond.OptCondTorch | None = <skwdro.solvers.optim_cond.OptCondTorch object>)[source]

Bases: BaseEstimator, RegressorMixin

A Wasserstein Distributionally Robust linear regression.

The cost function is

\ell(\theta,\xi=(x,y)) = \frac{1}{2}(\langle \theta,x \rangle - y)^2

The WDRO problem solved at fitting is

\min_{\theta} \max_{\mathbb{Q} : W(\mathbb{P}_n,\mathbb{Q})} \mathbb{E}_{\xi\sim\mathbb{Q}} \ell(\theta,\xi=(x,y))

Parameters:
rhofloat, default=1e-2

Robustness radius

l2_regfloat, default=0.

l2 regularization

fit_interceptboolean, default=True

Determines if an intercept is fit or not

cost: str, default=”t-NLC-2-2”

Tiret-separated code to define the transport cost: “<engine>-<cost id>-<k-norm type>-<power>” for c(x, y):=\|x-y\|_k^p

solver: str, default=’entropic’

Solver to be used: ‘entropic’, ‘entropic_torch’ (_pre or _post) or ‘dedicated’

solver_reg: float, default=1.0

regularization value for the entropic solver

n_zeta_samples: int, default=10

number of adversarial samples to draw

opt_cond: Optional[OptCondTorch]

optimality condition, see OptCondTorch

Attributes:
coef_array, shape (n_features,)

parameter vector (w in the cost function formula)

intercept_float

constant term in decision function.

Examples

>>> import numpy as np
>>> from skwdro.linear_models import LinearRegression as RobustLinearRegression
>>> from sklearn.model_selection import train_test_split
>>> d = 10; m = 100
>>> x0 = np.random.randn(d)
>>> X = np.random.randn(m,d)
>>> y = X.dot(x0) +  np.random.randn(m)
>>> X_train, X_test, y_train, y_test = train_test_split(X,y)
>>> rob_lin = RobustLinearRegression(rho=0.1,solver="entropic",fit_intercept=True)
>>> rob_lin.fit(X_train, y_train)
LinearRegression(rho=0.1)
>>> y_pred_rob = rob_lin.predict(X_test)
fit(X, y)[source]

Fits the WDRO classifier.

Parameters:
Xarray-like, shape (n_samples, n_features)

The training input samples.

yarray-like, shape (n_samples,)

The target values. An array of int. Only -1 or +1 are currently supported

Returns:
selfobject

Returns self.

predict(X)[source]

Robust prediction.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
yndarray, shape (n_samples,)

The prediction

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LinearRegression

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.

class skwdro.linear_models.LogisticRegression(rho: float = 0.01, l2_reg: float = 0.0, fit_intercept: bool = True, cost: str = 't-NLC-2-2', solver='entropic_torch', solver_reg: float | None = None, sampler_reg: float | None = None, n_zeta_samples: int = 10, random_state: int = 0, opt_cond: ~skwdro.solvers.optim_cond.OptCondTorch | None = <skwdro.solvers.optim_cond.OptCondTorch object>)[source]

Bases: BaseEstimator, ClassifierMixin

A Wasserstein Distributionally Robust logistic regression classifier.

The cost function is XXX

Uncertainty is XXX

Parameters:
rho: float, default=1e-2

Robustness radius

l2_reg: float, default=None

l2 regularization

fit_intercept: boolean, default=True

Determines if an intercept is fit or not

cost: str, default=”n-NC-1-2”

Tiret-separated code to define the transport cost: “<engine>-<cost id>-<k-norm type>-<power>” for c(x, y):=\|x-y\|_k^p

solver: str, default=’entropic_torch’

Solver to be used: ‘entropic’, ‘entropic_torch’ (_pre or _post) or ‘dedicated’

solver_reg: float, default=1e-2

regularization value for the entropic solver

n_zeta_samples: int, default=10

number of adversarial samples to draw

opt_cond: Optional[OptCondTorch]

optimality condition, see OptCondTorch

Attributes:
coef_array, shape (n_features,)

parameter vector (w in the cost function formula)

intercept_float

constant term in decision function.

Examples

>>> import numpy as np
>>> from skwdro.linear_models import LogisticRegression
>>> from sklearn.datasets import make_blobs
>>> from sklearn.model_selection import train_test_split
>>> X, y = make_blobs(n_samples=100, centers=2, n_features=2, random_state=0)
>>> y = np.sign(y-0.5)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
>>> estimator = LogisticRegression()
>>> estimator.fit(X_train,y_train)
LogisticRegression()
>>> estimator.predict(X_test)
array([-1., -1., -1.,  1., -1.,  1.,  1., -1., -1.,  1.,  1.,  1., -1.,
        1.,  1.,  1.,  1.,  1., -1., -1., -1.,  1.,  1., -1., -1.,  1.,
       -1.,  1.,  1.,  1.,  1.,  1., -1.])
>>> estimator.score(X_test,y_test)
0.9393939393939394
fit(X, y)[source]

Fits the WDRO classifier.

Parameters:
Xarray-like, shape (n_samples, n_features)

The training input samples.

yarray-like, shape (n_samples,)

The target values. An array of int. Only -1 or +1 are currently supported

Returns:
selfLogisticRegression

Returns self.

predict(X)[source]

Robust prediction.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
yndarray, shape (n_samples,)

The label for each sample is the label of the closest sample seen during fit.

predict_proba(X)[source]

Robust prediction probability for classes -1 and +1.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
pndarray, shape (n_samples,2)

The probability of each class for each of the samples.

predict_proba_2Class(X)[source]

Robust prediction probability for class +1.

Parameters:
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns:
pndarray, shape (n_samples,)

The probability of class +1 for each of the samples.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LogisticRegression

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.