API references

XGBoostLSS - An extension of XGBoost to probabilistic forecasting

datasets

XGBoostLSS - An extension of XGBoost to probabilistic forecasting

data_loader

load_articlake_data()

Returns the arctic lake sediment data: sand, silt, clay compositions of 39 sediment samples at different water depths in an Arctic lake.

Contains the following columns

sand: numeric Vector of percentages of sand. silt: numeric Vector of percentages of silt. clay: numeric Vector of percentages of clay depth: numeric Vector of water depths (meters) in which samples are taken.

Source

https://rdrr.io/rforge/DirichletReg/

Source code in xgboostlss/datasets/data_loader.py
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def load_articlake_data():
    """
    Returns the arctic lake sediment data: sand, silt, clay compositions of 39 sediment samples at different water
    depths in an Arctic lake.

    Contains the following columns:
        sand: numeric
            Vector of percentages of sand.
        silt: numeric
            Vector of percentages of silt.
        clay: numeric
            Vector of percentages of clay
        depth: numeric
            Vector of water depths (meters) in which samples are taken.

    Source
    ------
    https://rdrr.io/rforge/DirichletReg/
    """
    data_path = pkg_resources.resource_stream(__name__, "arcticlake.csv")
    data_df = pd.read_csv(data_path)

    return data_df

load_simulated_gaussian_data()

Returns train/test dataframe of a simulated example.

Contains the following columns

y int64: response x int64: x-feature X1:X10 int64: random noise features

Source code in xgboostlss/datasets/data_loader.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
def load_simulated_gaussian_data():
    """
    Returns train/test dataframe of a simulated example.

    Contains the following columns:
        y              int64: response
        x              int64: x-feature
        X1:X10         int64: random noise features

    """
    train_path = pkg_resources.resource_stream(__name__, "gaussian_train_sim.csv")
    train_df = pd.read_csv(train_path)

    test_path = pkg_resources.resource_stream(__name__, "gaussian_test_sim.csv")
    test_df = pd.read_csv(test_path)

    return train_df, test_df

load_simulated_multivariate_gaussian_data()

Returns train/test dataframe of a simulated example.

Contains the following columns

y int64: response x int64: x-feature

Source code in xgboostlss/datasets/data_loader.py
43
44
45
46
47
48
49
50
51
52
53
54
55
def load_simulated_multivariate_gaussian_data():
    """
    Returns train/test dataframe of a simulated example.

    Contains the following columns:
        y              int64: response
        x              int64: x-feature

    """
    data_path = pkg_resources.resource_stream(__name__, "sim_triv_gaussian.csv")
    data_df = pd.read_csv(data_path)

    return data_df

load_simulated_multivariate_studentT_data()

Returns train/test dataframe of a simulated example.

Contains the following columns

y int64: response x int64: x-feature

Source code in xgboostlss/datasets/data_loader.py
58
59
60
61
62
63
64
65
66
67
68
69
70
def load_simulated_multivariate_studentT_data():
    """
    Returns train/test dataframe of a simulated example.

    Contains the following columns:
        y              int64: response
        x              int64: x-feature

    """
    data_path = pkg_resources.resource_stream(__name__, "sim_triv_studentT.csv")
    data_df = pd.read_csv(data_path)

    return data_df

load_simulated_studentT_data()

Returns train/test dataframe of a simulated example.

Contains the following columns

y int64: response x int64: x-feature X1:X10 int64: random noise features

Source code in xgboostlss/datasets/data_loader.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def load_simulated_studentT_data():
    """
    Returns train/test dataframe of a simulated example.

    Contains the following columns:
        y              int64: response
        x              int64: x-feature
        X1:X10         int64: random noise features

    """
    train_path = pkg_resources.resource_stream(__name__, "studentT_train_sim.csv")
    train_df = pd.read_csv(train_path)

    test_path = pkg_resources.resource_stream(__name__, "studentT_test_sim.csv")
    test_df = pd.read_csv(test_path)

    return train_df, test_df

distributions

XGBoostLSS - An extension of XGBoost to probabilistic forecasting

Beta

Beta

Bases: DistributionClass

Beta distribution class.

Distributional Parameters

concentration1: torch.Tensor 1st concentration parameter of the distribution (often referred to as alpha). concentration0: torch.Tensor 2nd concentration parameter of the distribution (often referred to as beta).

Source

https://pytorch.org/docs/stable/distributions.html#beta

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/Beta.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Beta(DistributionClass):
    """
    Beta distribution class.

    Distributional Parameters
    -------------------------
    concentration1: torch.Tensor
        1st concentration parameter of the distribution (often referred to as alpha).
    concentration0: torch.Tensor
        2nd concentration parameter of the distribution (often referred to as beta).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#beta

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Beta_Torch
        param_dict = {"concentration1": response_fn, "concentration0": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Cauchy

Cauchy

Bases: DistributionClass

Cauchy distribution class.

Distributional Parameters

loc: torch.Tensor Mode or median of the distribution. scale: torch.Tensor Half width at half maximum.

Source

https://pytorch.org/docs/stable/distributions.html#cauchy

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/Cauchy.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Cauchy(DistributionClass):
    """
    Cauchy distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mode or median of the distribution.
    scale: torch.Tensor
        Half width at half maximum.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#cauchy

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Cauchy_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Dirichlet

Dirichlet

Bases: Multivariate_DistributionClass

Dirichlet distribution class.

The Dirichlet distribution is commonly used for modelling non-negative compositional data, i.e., data that consist of sub-sets that are fractions of some total. Compositional data are typically represented as proportions or percentages summing to 1, so that the Dirichlet extends the univariate beta-distribution to the multivariate case.

Distributional Parameters

concentration: torch.Tensor Concentration parameter of the distribution (often referred to as alpha).

Source

https://pytorch.org/docs/stable/distributions.html#dirichlet

Parameters

D: int Number of targets. stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/Dirichlet.py
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
class Dirichlet(Multivariate_DistributionClass):
    """
    Dirichlet distribution class.

    The Dirichlet distribution is commonly used for modelling non-negative compositional data, i.e., data that consist
    of sub-sets that are fractions of some total. Compositional data are typically represented as proportions or
    percentages summing to 1, so that the Dirichlet extends the univariate beta-distribution to the multivariate case.

    Distributional Parameters
    -------------------------
    concentration: torch.Tensor
        Concentration parameter of the distribution (often referred to as alpha).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#dirichlet

    Parameters
    -------------------------
    D: int
        Number of targets.
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 D: int = 2,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):
        # Input Checks
        if not isinstance(D, int):
            raise ValueError("Invalid dimensionality type. Please choose an integer for D.")
        if D < 2:
            raise ValueError("Invalid dimensionality. Please choose D >= 2.")
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select from 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus' or 'relu.")

        # Set the parameters specific to the distribution
        distribution = Dirichlet_Torch
        param_dict = Dirichlet.create_param_dict(n_targets=D, response_fn=response_fn)
        distribution_arg_names = ["concentration"]
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=False,
                         distribution_arg_names=distribution_arg_names,
                         n_targets=D,
                         n_dist_param=len(param_dict),
                         param_dict=param_dict,
                         param_transform=Dirichlet.param_transform,
                         get_dist_params=Dirichlet.get_dist_params,
                         discrete=False,
                         stabilization=stabilization,
                         loss_fn=loss_fn
                         )

    @staticmethod
    def create_param_dict(n_targets: int,
                          response_fn: Callable
                          ) -> Dict:
        """ Function that transforms the distributional parameters to the desired scale.

        Arguments
        ---------
        n_targets: int
            Number of targets.
        response_fn: Callable
            Response function.

        Returns
        -------
        param_dict: Dict
            Dictionary of distributional parameters.
        """
        # Concentration
        param_dict = {"concentration_" + str(i + 1): response_fn for i in range(n_targets)}

        return param_dict

    @staticmethod
    def param_transform(params: List[torch.Tensor],
                        param_dict: Dict,
                        n_targets: int,
                        rank: Optional[int],
                        n_obs: int,
                        ) -> List[torch.Tensor]:
        """ Function that returns a list of parameters for a Dirichlet distribution.

        Arguments
        ---------
        params: List[torch.Tensor]
            List of distributional parameters.
        param_dict: Dict
        n_targets: int
            Number of targets.
        rank: Optional[int]
            Rank of the low-rank form of the covariance matrix.
        n_obs: int
            Number of observations.

        Returns
        -------
        params: List[torch.Tensor]
            List of parameters.
        """
        # Transform Parameters to respective scale
        params = torch.cat([
            response_fun(params[i].reshape(-1, 1)) for i, (dist_param, response_fun) in enumerate(param_dict.items())
        ], dim=1)

        return params

    @staticmethod
    def get_dist_params(n_targets: int,
                        dist_pred: torch.distributions.Distribution,
                        ) -> pd.DataFrame:
        """
        Function that returns the predicted distributional parameters.

        Arguments
        ---------
        n_targets: int
            Number of targets.
        dist_pred: torch.distributions.Distribution
            Predicted distribution.

        Returns
        -------
        dist_params_df: pd.DataFrame
            DataFrame with predicted distributional parameters.
        """
        # Concentration
        dist_params_df = pd.DataFrame(dist_pred.concentration.numpy())
        dist_params_df.columns = [f"concentration_{i + 1}" for i in range(n_targets)]

        # # Normalize to sum to 1
        # dist_params_df = dist_params_df.div(dist_params_df.sum(axis=1), axis=0)

        return dist_params_df
create_param_dict(n_targets, response_fn) staticmethod

Function that transforms the distributional parameters to the desired scale.

Arguments

n_targets: int Number of targets. response_fn: Callable Response function.

Returns

param_dict: Dict Dictionary of distributional parameters.

Source code in xgboostlss/distributions/Dirichlet.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
@staticmethod
def create_param_dict(n_targets: int,
                      response_fn: Callable
                      ) -> Dict:
    """ Function that transforms the distributional parameters to the desired scale.

    Arguments
    ---------
    n_targets: int
        Number of targets.
    response_fn: Callable
        Response function.

    Returns
    -------
    param_dict: Dict
        Dictionary of distributional parameters.
    """
    # Concentration
    param_dict = {"concentration_" + str(i + 1): response_fn for i in range(n_targets)}

    return param_dict
get_dist_params(n_targets, dist_pred) staticmethod

Function that returns the predicted distributional parameters.

Arguments

n_targets: int Number of targets. dist_pred: torch.distributions.Distribution Predicted distribution.

Returns

dist_params_df: pd.DataFrame DataFrame with predicted distributional parameters.

Source code in xgboostlss/distributions/Dirichlet.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
@staticmethod
def get_dist_params(n_targets: int,
                    dist_pred: torch.distributions.Distribution,
                    ) -> pd.DataFrame:
    """
    Function that returns the predicted distributional parameters.

    Arguments
    ---------
    n_targets: int
        Number of targets.
    dist_pred: torch.distributions.Distribution
        Predicted distribution.

    Returns
    -------
    dist_params_df: pd.DataFrame
        DataFrame with predicted distributional parameters.
    """
    # Concentration
    dist_params_df = pd.DataFrame(dist_pred.concentration.numpy())
    dist_params_df.columns = [f"concentration_{i + 1}" for i in range(n_targets)]

    # # Normalize to sum to 1
    # dist_params_df = dist_params_df.div(dist_params_df.sum(axis=1), axis=0)

    return dist_params_df
param_transform(params, param_dict, n_targets, rank, n_obs) staticmethod

Function that returns a list of parameters for a Dirichlet distribution.

Arguments

params: List[torch.Tensor] List of distributional parameters. param_dict: Dict n_targets: int Number of targets. rank: Optional[int] Rank of the low-rank form of the covariance matrix. n_obs: int Number of observations.

Returns

params: List[torch.Tensor] List of parameters.

Source code in xgboostlss/distributions/Dirichlet.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
@staticmethod
def param_transform(params: List[torch.Tensor],
                    param_dict: Dict,
                    n_targets: int,
                    rank: Optional[int],
                    n_obs: int,
                    ) -> List[torch.Tensor]:
    """ Function that returns a list of parameters for a Dirichlet distribution.

    Arguments
    ---------
    params: List[torch.Tensor]
        List of distributional parameters.
    param_dict: Dict
    n_targets: int
        Number of targets.
    rank: Optional[int]
        Rank of the low-rank form of the covariance matrix.
    n_obs: int
        Number of observations.

    Returns
    -------
    params: List[torch.Tensor]
        List of parameters.
    """
    # Transform Parameters to respective scale
    params = torch.cat([
        response_fun(params[i].reshape(-1, 1)) for i, (dist_param, response_fun) in enumerate(param_dict.items())
    ], dim=1)

    return params

Expectile

Expectile

Bases: DistributionClass

Expectile distribution class.

Distributional Parameters

expectile: List List of specified expectiles.

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". expectiles: List List of expectiles in increasing order. penalize_crossing: bool Whether to include a penalty term to discourage crossing of expectiles.

Source code in xgboostlss/distributions/Expectile.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
class Expectile(DistributionClass):
    """
    Expectile distribution class.

    Distributional Parameters
    -------------------------
    expectile: List
        List of specified expectiles.

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    expectiles: List
        List of expectiles in increasing order.
    penalize_crossing: bool
        Whether to include a penalty term to discourage crossing of expectiles.
    """
    def __init__(self,
                 stabilization: str = "None",
                 expectiles: List = [0.1, 0.5, 0.9],
                 penalize_crossing: bool = False,
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if not isinstance(expectiles, list):
            raise ValueError("Expectiles must be a list.")
        if not all([0 < expectile < 1 for expectile in expectiles]):
            raise ValueError("Expectiles must be between 0 and 1.")
        if not isinstance(penalize_crossing, bool):
            raise ValueError("penalize_crossing must be a boolean. Please choose from True or False.")

        # Set the parameters specific to the distribution
        distribution = Expectile_Torch
        torch.distributions.Distribution.set_default_validate_args(False)
        expectiles.sort()
        param_dict = {}
        for expectile in expectiles:
            key = f"expectile_{expectile}"
            param_dict[key] = identity_fn

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn="nll",
                         tau=torch.tensor(expectiles),
                         penalize_crossing=penalize_crossing
                         )

Expectile_Torch

Bases: Distribution

PyTorch implementation of expectiles.

Arguments

expectiles : List[torch.Tensor] List of expectiles. penalize_crossing : bool Whether to include a penalty term to discourage crossing of expectiles.

Source code in xgboostlss/distributions/Expectile.py
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class Expectile_Torch(Distribution):
    """
    PyTorch implementation of expectiles.

    Arguments
    ---------
    expectiles : List[torch.Tensor]
        List of expectiles.
    penalize_crossing : bool
        Whether to include a penalty term to discourage crossing of expectiles.
    """
    def __init__(self,
                 expectiles: List[torch.Tensor],
                 penalize_crossing: bool = False,
                 ):
        super(Expectile_Torch).__init__()
        self.expectiles = expectiles
        self.penalize_crossing = penalize_crossing
        self.__class__.__name__ = "Expectile"

    def log_prob(self, value: torch.Tensor, tau: List[torch.Tensor]) -> torch.Tensor:
        """
        Returns the log of the probability density function evaluated at `value`.

        Arguments
        ---------
        value : torch.Tensor
            Response for which log probability is to be calculated.
        tau : List[torch.Tensor]
            List of asymmetry parameters.

        Returns
        -------
        torch.Tensor
            Log probability of `value`.
        """
        value = value.reshape(-1, 1)
        loss = torch.tensor(0.0, dtype=torch.float32)
        penalty = torch.tensor(0.0, dtype=torch.float32)

        # Calculate loss
        predt_expectiles = []
        for expectile, tau_value in zip(self.expectiles, tau):
            weight = torch.where(value - expectile >= 0, tau_value, 1 - tau_value)
            loss += torch.nansum(weight * (value - expectile) ** 2)
            predt_expectiles.append(expectile.reshape(-1, 1))

        # Penalty term to discourage crossing of expectiles
        if self.penalize_crossing:
            predt_expectiles = torch.cat(predt_expectiles, dim=1)
            penalty = torch.mean(
                (~torch.all(torch.diff(predt_expectiles, dim=1) > 0, dim=1)).float()
            )

        loss = (loss * (1 + penalty)) / len(self.expectiles)

        return -loss
log_prob(value, tau)

Returns the log of the probability density function evaluated at value.

Arguments

value : torch.Tensor Response for which log probability is to be calculated. tau : List[torch.Tensor] List of asymmetry parameters.

Returns

torch.Tensor Log probability of value.

Source code in xgboostlss/distributions/Expectile.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def log_prob(self, value: torch.Tensor, tau: List[torch.Tensor]) -> torch.Tensor:
    """
    Returns the log of the probability density function evaluated at `value`.

    Arguments
    ---------
    value : torch.Tensor
        Response for which log probability is to be calculated.
    tau : List[torch.Tensor]
        List of asymmetry parameters.

    Returns
    -------
    torch.Tensor
        Log probability of `value`.
    """
    value = value.reshape(-1, 1)
    loss = torch.tensor(0.0, dtype=torch.float32)
    penalty = torch.tensor(0.0, dtype=torch.float32)

    # Calculate loss
    predt_expectiles = []
    for expectile, tau_value in zip(self.expectiles, tau):
        weight = torch.where(value - expectile >= 0, tau_value, 1 - tau_value)
        loss += torch.nansum(weight * (value - expectile) ** 2)
        predt_expectiles.append(expectile.reshape(-1, 1))

    # Penalty term to discourage crossing of expectiles
    if self.penalize_crossing:
        predt_expectiles = torch.cat(predt_expectiles, dim=1)
        penalty = torch.mean(
            (~torch.all(torch.diff(predt_expectiles, dim=1) > 0, dim=1)).float()
        )

    loss = (loss * (1 + penalty)) / len(self.expectiles)

    return -loss

expectile_norm(tau=0.5, m=0, sd=1)

Calculates expectiles from Normal distribution for given tau values. For more details and distributions see https://rdrr.io/cran/expectreg/man/enorm.html

Arguments


tau : np.ndarray Vector of expectiles from the respective distribution. m : np.ndarray Mean of the Normal distribution. sd : np.ndarray Standard deviation of the Normal distribution.

Returns


np.ndarray

Source code in xgboostlss/distributions/Expectile.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def expectile_norm(tau: np.ndarray = 0.5,
                   m: np.ndarray = 0,
                   sd: np.ndarray = 1):
    """
    Calculates expectiles from Normal distribution for given tau values.
    For more details and distributions see https://rdrr.io/cran/expectreg/man/enorm.html

    Arguments
    _________
    tau : np.ndarray
        Vector of expectiles from the respective distribution.
    m : np.ndarray
        Mean of the Normal distribution.
    sd : np.ndarray
        Standard deviation of the Normal distribution.

    Returns
    _______
    np.ndarray
    """
    tau[tau > 1 or tau < 0] = np.nan
    zz = 0 * tau
    lower = np.array(-10, dtype="float")
    lower = np.repeat(lower[np.newaxis, ...], len(tau), axis=0)
    upper = np.array(10, dtype="float")
    upper = np.repeat(upper[np.newaxis, ...], len(tau), axis=0)
    diff = 1
    index = 0
    while (diff > 1e-10) and (index < 1000):
        root = expectile_pnorm(zz) - tau
        root[np.isnan(root)] = 0
        lower[root < 0] = zz[root < 0]
        upper[root > 0] = zz[root > 0]
        zz = (upper + lower) / 2
        diff = np.nanmax(np.abs(root))
        index = index + 1
    zz[np.isnan(tau)] = np.nan

    return zz * sd + m

expectile_pnorm(tau=0.5, m=0, sd=1)

Normal Expectile Distribution Function. For more details and distributions see https://rdrr.io/cran/expectreg/man/enorm.html

Arguments


tau : np.ndarray Vector of expectiles from the respective distribution. m : np.ndarray Mean of the Normal distribution. sd : np.ndarray Standard deviation of the Normal distribution.

Returns


tau : np.ndarray Expectiles from the Normal distribution.

Source code in xgboostlss/distributions/Expectile.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def expectile_pnorm(tau: np.ndarray = 0.5,
                    m: np.ndarray = 0,
                    sd: np.ndarray = 1
                    ):
    """
    Normal Expectile Distribution Function.
    For more details and distributions see https://rdrr.io/cran/expectreg/man/enorm.html

    Arguments
    _________
    tau : np.ndarray
        Vector of expectiles from the respective distribution.
    m : np.ndarray
        Mean of the Normal distribution.
    sd : np.ndarray
        Standard deviation of the Normal distribution.

    Returns
    _______
    tau : np.ndarray
        Expectiles from the Normal distribution.
    """
    z = (tau - m) / sd
    p = norm.cdf(z, loc=m, scale=sd)
    d = norm.pdf(z, loc=m, scale=sd)
    u = -d - z * p
    tau = u / (2 * u + z)

    return tau

Gamma

Gamma

Bases: DistributionClass

Gamma distribution class.

Distributional Parameters

concentration: torch.Tensor shape parameter of the distribution (often referred to as alpha) rate: torch.Tensor rate = 1 / scale of the distribution (often referred to as beta)

Source

https://pytorch.org/docs/stable/distributions.html#gamma

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/Gamma.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Gamma(DistributionClass):
    """
    Gamma distribution class.

     Distributional Parameters
    --------------------------
    concentration: torch.Tensor
        shape parameter of the distribution (often referred to as alpha)
    rate: torch.Tensor
        rate = 1 / scale of the distribution (often referred to as beta)

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#gamma

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Gamma_Torch
        param_dict = {"concentration": response_fn, "rate": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Gaussian

Gaussian

Bases: DistributionClass

Gaussian distribution class.

Distributional Parameters

loc: torch.Tensor Mean of the distribution (often referred to as mu). scale: torch.Tensor Standard deviation of the distribution (often referred to as sigma).

Source

https://pytorch.org/docs/stable/distributions.html#normal

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/Gaussian.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Gaussian(DistributionClass):
    """
    Gaussian distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of the distribution (often referred to as mu).
    scale: torch.Tensor
        Standard deviation of the distribution (often referred to as sigma).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#normal

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Gaussian_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Gumbel

Gumbel

Bases: DistributionClass

Gumbel distribution class.

Distributional Parameters

loc: torch.Tensor Location parameter of the distribution. scale: torch.Tensor Scale parameter of the distribution.

Source

https://pytorch.org/docs/stable/distributions.html#gumbel

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/Gumbel.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Gumbel(DistributionClass):
    """
    Gumbel distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Location parameter of the distribution.
    scale: torch.Tensor
        Scale parameter of the distribution.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#gumbel

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Gumbel_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Laplace

Laplace

Bases: DistributionClass

Laplace distribution class.

Distributional Parameters

loc: torch.Tensor Mean of the distribution. scale: torch.Tensor Scale of the distribution.

Source

https://pytorch.org/docs/stable/distributions.html#laplace

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/Laplace.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Laplace(DistributionClass):
    """
    Laplace distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of the distribution.
    scale: torch.Tensor
        Scale of the distribution.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#laplace

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Laplace_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

LogNormal

LogNormal

Bases: DistributionClass

LogNormal distribution class.

Distributional Parameters

loc: torch.Tensor Mean of log of distribution. scale: torch.Tensor Standard deviation of log of the distribution.

Source

https://pytorch.org/docs/stable/distributions.html#lognormal

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/LogNormal.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class LogNormal(DistributionClass):
    """
    LogNormal distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of log of distribution.
    scale: torch.Tensor
        Standard deviation of log of the distribution.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#lognormal

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = LogNormal_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

MVN

MVN

Bases: Multivariate_DistributionClass

Multivariate Normal distribution class.

The multivariate normal distribution is parameterized by a mean vector and a lower-triangular matrix L with positive-valued diagonal entries, such that Σ=LL'. This triangular matrix can be obtained via, e.g., a Cholesky decomposition of the covariance.

Distributional Parameters

loc: torch.Tensor Mean of the distribution (often referred to as mu). scale_tril: torch.Tensor Lower-triangular factor of covariance, with positive-valued diagonal.

Source

https://pytorch.org/docs/stable/distributions.html#multivariatenormal

Parameters

D: int Number of targets. stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/MVN.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
class MVN(Multivariate_DistributionClass):
    """
    Multivariate Normal distribution class. 

    The multivariate normal distribution is parameterized by a mean vector and a lower-triangular matrix L with
    positive-valued diagonal entries, such that Σ=LL'. This triangular matrix can be obtained via, e.g., a Cholesky
    decomposition of the covariance.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of the distribution (often referred to as mu).
    scale_tril: torch.Tensor
        Lower-triangular factor of covariance, with positive-valued diagonal.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#multivariatenormal

    Parameters
    -------------------------
    D: int
        Number of targets.
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 D: int = 2,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):
        # Input Checks
        if not isinstance(D, int):
            raise ValueError("Invalid dimensionality type. Please choose an integer for D.")
        if D < 2:
            raise ValueError("Invalid dimensionality. Please choose D >= 2.")
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select from 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus' or 'relu.")

        # Set the parameters specific to the distribution
        distribution = MultivariateNormal_Torch
        param_dict = MVN.create_param_dict(n_targets=D, response_fn=response_fn)
        distribution_arg_names = ["loc", "scale_tril"]
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=False,
                         distribution_arg_names=distribution_arg_names,
                         n_targets=D,
                         n_dist_param=len(param_dict),
                         param_dict=param_dict,
                         param_transform=MVN.param_transform,
                         get_dist_params=MVN.get_dist_params,
                         discrete=False,
                         stabilization=stabilization,
                         loss_fn=loss_fn
                         )

    @staticmethod
    def create_param_dict(n_targets: int,
                          response_fn: Callable
                          ) -> Dict:
        """ Function that transforms the distributional parameters to the desired scale.

        Arguments
        ---------
        n_targets: int
            Number of targets.
        response_fn: Callable
            Response function.

        Returns
        -------
        param_dict: Dict
            Dictionary of distributional parameters.
        """
        # Location
        param_dict = {"location_" + str(i + 1): identity_fn for i in range(n_targets)}

        # Tril
        tril_indices = torch.tril_indices(row=n_targets, col=n_targets, offset=0)
        tril_idx = (tril_indices.detach().numpy()) + 1
        n_tril = int((n_targets * (n_targets + 1)) / 2)
        tril_diag = tril_idx[0] == tril_idx[1]

        tril_dict = {}
        for i in range(n_tril):
            if tril_diag[i]:
                tril_dict.update({"scale_tril_diag_" + str(tril_idx[:, i][1]): response_fn})
            else:
                tril_dict.update({"scale_tril_offdiag_" + str(tril_idx[:, i][1]) + str(tril_idx[:, i][0]): identity_fn})

        param_dict.update(tril_dict)

        return param_dict

    @staticmethod
    def param_transform(params: List[torch.Tensor],
                        param_dict: Dict,
                        n_targets: int,
                        rank: Optional[int],
                        n_obs: int,
                        ) -> List[torch.Tensor]:
        """ Function that returns a list of parameters for a multivariate normal distribution, parameterized
        by a location vector and the lower triangular matrix of the covariance matrix (Cholesky).

        Arguments
        ---------
        params: List[torch.Tensor]
            List of distributional parameters.
        param_dict: Dict
        n_targets: int
            Number of targets.
        rank: Optional[int]
            Rank of the low-rank form of the covariance matrix.
        n_obs: int
            Number of observations.

        Returns
        -------
        params: List[torch.Tensor]
            List of parameters.
        """
        # Transform Parameters to respective scale
        params = [
            response_fun(params[i].reshape(-1, 1)) for i, (dist_param, response_fun) in enumerate(param_dict.items())
        ]

        # Location
        loc = torch.cat(params[:n_targets], axis=1)

        # Scale Tril
        tril_predt = torch.cat(params[n_targets:], axis=1)
        tril_indices = torch.tril_indices(row=n_targets, col=n_targets, offset=0)
        scale_tril = torch.zeros(n_obs, n_targets, n_targets, dtype=tril_predt.dtype)
        scale_tril[:, tril_indices[0], tril_indices[1]] = tril_predt

        params = [loc, scale_tril]

        return params

    @staticmethod
    def get_dist_params(n_targets: int,
                        dist_pred: torch.distributions.Distribution,
                        ) -> pd.DataFrame:
        """
        Function that returns the predicted distributional parameters.

        Arguments
        ---------
        n_targets: int
            Number of targets.
        dist_pred: torch.distributions.Distribution
            Predicted distribution.

        Returns
        -------
        dist_params_df: pd.DataFrame
            DataFrame with predicted distributional parameters.
        """

        # Location
        location_df = pd.DataFrame(dist_pred.loc.numpy())
        location_df.columns = [f"location_{i + 1}" for i in range(n_targets)]

        # Scale
        scale_df = pd.DataFrame(dist_pred.stddev.detach().numpy())
        scale_df.columns = [f"scale_{i + 1}" for i in range(n_targets)]

        # Rho
        n_obs = location_df.shape[0]
        n_rho = int((n_targets * (n_targets - 1)) / 2)
        cov_mat = dist_pred.covariance_matrix
        rho_df = pd.DataFrame(
            np.concatenate([MVN.covariance_to_correlation(cov_mat[i]).reshape(-1, n_rho) for i in range(n_obs)], axis=0)
        )
        rho_idx = list(combinations(range(1, n_targets + 1), 2))
        rho_df.columns = [f"rho_{''.join(map(str, rho_idx[i]))}" for i in range(rho_df.shape[1])]

        # Concatenate
        dist_params_df = pd.concat([location_df, scale_df, rho_df], axis=1)

        return dist_params_df

    @staticmethod
    def covariance_to_correlation(cov_mat: torch.Tensor) -> np.ndarray:
        """ Function that calculates the correlation matrix from the covariance matrix.

        Arguments
        ---------
        cov_mat: torch.Tensor
            Covariance matrix.

        Returns
        -------
        cor_mat: np.ndarray
            Correlation matrix.
        """
        cov_mat = np.array(cov_mat)
        diag = np.sqrt(np.diag(np.diag(cov_mat)))
        diag_inv = np.linalg.inv(diag)
        cor_mat = diag_inv @ cov_mat @ diag_inv
        cor_mat = cor_mat[np.tril_indices_from(cor_mat, k=-1)]

        return cor_mat
covariance_to_correlation(cov_mat) staticmethod

Function that calculates the correlation matrix from the covariance matrix.

Arguments

cov_mat: torch.Tensor Covariance matrix.

Returns

cor_mat: np.ndarray Correlation matrix.

Source code in xgboostlss/distributions/MVN.py
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
@staticmethod
def covariance_to_correlation(cov_mat: torch.Tensor) -> np.ndarray:
    """ Function that calculates the correlation matrix from the covariance matrix.

    Arguments
    ---------
    cov_mat: torch.Tensor
        Covariance matrix.

    Returns
    -------
    cor_mat: np.ndarray
        Correlation matrix.
    """
    cov_mat = np.array(cov_mat)
    diag = np.sqrt(np.diag(np.diag(cov_mat)))
    diag_inv = np.linalg.inv(diag)
    cor_mat = diag_inv @ cov_mat @ diag_inv
    cor_mat = cor_mat[np.tril_indices_from(cor_mat, k=-1)]

    return cor_mat
create_param_dict(n_targets, response_fn) staticmethod

Function that transforms the distributional parameters to the desired scale.

Arguments

n_targets: int Number of targets. response_fn: Callable Response function.

Returns

param_dict: Dict Dictionary of distributional parameters.

Source code in xgboostlss/distributions/MVN.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
@staticmethod
def create_param_dict(n_targets: int,
                      response_fn: Callable
                      ) -> Dict:
    """ Function that transforms the distributional parameters to the desired scale.

    Arguments
    ---------
    n_targets: int
        Number of targets.
    response_fn: Callable
        Response function.

    Returns
    -------
    param_dict: Dict
        Dictionary of distributional parameters.
    """
    # Location
    param_dict = {"location_" + str(i + 1): identity_fn for i in range(n_targets)}

    # Tril
    tril_indices = torch.tril_indices(row=n_targets, col=n_targets, offset=0)
    tril_idx = (tril_indices.detach().numpy()) + 1
    n_tril = int((n_targets * (n_targets + 1)) / 2)
    tril_diag = tril_idx[0] == tril_idx[1]

    tril_dict = {}
    for i in range(n_tril):
        if tril_diag[i]:
            tril_dict.update({"scale_tril_diag_" + str(tril_idx[:, i][1]): response_fn})
        else:
            tril_dict.update({"scale_tril_offdiag_" + str(tril_idx[:, i][1]) + str(tril_idx[:, i][0]): identity_fn})

    param_dict.update(tril_dict)

    return param_dict
get_dist_params(n_targets, dist_pred) staticmethod

Function that returns the predicted distributional parameters.

Arguments

n_targets: int Number of targets. dist_pred: torch.distributions.Distribution Predicted distribution.

Returns

dist_params_df: pd.DataFrame DataFrame with predicted distributional parameters.

Source code in xgboostlss/distributions/MVN.py
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
@staticmethod
def get_dist_params(n_targets: int,
                    dist_pred: torch.distributions.Distribution,
                    ) -> pd.DataFrame:
    """
    Function that returns the predicted distributional parameters.

    Arguments
    ---------
    n_targets: int
        Number of targets.
    dist_pred: torch.distributions.Distribution
        Predicted distribution.

    Returns
    -------
    dist_params_df: pd.DataFrame
        DataFrame with predicted distributional parameters.
    """

    # Location
    location_df = pd.DataFrame(dist_pred.loc.numpy())
    location_df.columns = [f"location_{i + 1}" for i in range(n_targets)]

    # Scale
    scale_df = pd.DataFrame(dist_pred.stddev.detach().numpy())
    scale_df.columns = [f"scale_{i + 1}" for i in range(n_targets)]

    # Rho
    n_obs = location_df.shape[0]
    n_rho = int((n_targets * (n_targets - 1)) / 2)
    cov_mat = dist_pred.covariance_matrix
    rho_df = pd.DataFrame(
        np.concatenate([MVN.covariance_to_correlation(cov_mat[i]).reshape(-1, n_rho) for i in range(n_obs)], axis=0)
    )
    rho_idx = list(combinations(range(1, n_targets + 1), 2))
    rho_df.columns = [f"rho_{''.join(map(str, rho_idx[i]))}" for i in range(rho_df.shape[1])]

    # Concatenate
    dist_params_df = pd.concat([location_df, scale_df, rho_df], axis=1)

    return dist_params_df
param_transform(params, param_dict, n_targets, rank, n_obs) staticmethod

Function that returns a list of parameters for a multivariate normal distribution, parameterized by a location vector and the lower triangular matrix of the covariance matrix (Cholesky).

Arguments

params: List[torch.Tensor] List of distributional parameters. param_dict: Dict n_targets: int Number of targets. rank: Optional[int] Rank of the low-rank form of the covariance matrix. n_obs: int Number of observations.

Returns

params: List[torch.Tensor] List of parameters.

Source code in xgboostlss/distributions/MVN.py
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
@staticmethod
def param_transform(params: List[torch.Tensor],
                    param_dict: Dict,
                    n_targets: int,
                    rank: Optional[int],
                    n_obs: int,
                    ) -> List[torch.Tensor]:
    """ Function that returns a list of parameters for a multivariate normal distribution, parameterized
    by a location vector and the lower triangular matrix of the covariance matrix (Cholesky).

    Arguments
    ---------
    params: List[torch.Tensor]
        List of distributional parameters.
    param_dict: Dict
    n_targets: int
        Number of targets.
    rank: Optional[int]
        Rank of the low-rank form of the covariance matrix.
    n_obs: int
        Number of observations.

    Returns
    -------
    params: List[torch.Tensor]
        List of parameters.
    """
    # Transform Parameters to respective scale
    params = [
        response_fun(params[i].reshape(-1, 1)) for i, (dist_param, response_fun) in enumerate(param_dict.items())
    ]

    # Location
    loc = torch.cat(params[:n_targets], axis=1)

    # Scale Tril
    tril_predt = torch.cat(params[n_targets:], axis=1)
    tril_indices = torch.tril_indices(row=n_targets, col=n_targets, offset=0)
    scale_tril = torch.zeros(n_obs, n_targets, n_targets, dtype=tril_predt.dtype)
    scale_tril[:, tril_indices[0], tril_indices[1]] = tril_predt

    params = [loc, scale_tril]

    return params

MVN_LoRa

MVN_LoRa

Bases: Multivariate_DistributionClass

Multivariate Normal distribution class.

Creates a multivariate normal distribution with covariance matrix having a low-rank form parameterized by cov_factor and cov_diag:

`covariance_matrix = cov_factor @ cov_factor.T + cov_diag`
Distributional Parameters

loc: torch.Tensor Mean of the distribution (often referred to as mu). cov_factor: torch.Tensor Factor part of low-rank form of covariance matrix. cov_diag: torch.Tensor Diagonal part of low-rank form of covariance matrix.

Source

https://pytorch.org/docs/stable/distributions.html#lowrankmultivariatenormal

Parameters

D: int Number of targets. rank: int Rank of the low-rank form of the covariance matrix. stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/MVN_LoRa.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
class MVN_LoRa(Multivariate_DistributionClass):
    """
    Multivariate Normal distribution class.

    Creates a multivariate normal distribution with covariance matrix having a low-rank form parameterized by
    `cov_factor` and `cov_diag`:

        `covariance_matrix = cov_factor @ cov_factor.T + cov_diag`


    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of the distribution (often referred to as mu).
    cov_factor: torch.Tensor
        Factor part of low-rank form of covariance matrix.
    cov_diag: torch.Tensor
        Diagonal part of low-rank form of covariance matrix.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#lowrankmultivariatenormal

    Parameters
    -------------------------
    D: int
        Number of targets.
    rank: int
        Rank of the low-rank form of the covariance matrix.
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 D: int = 2,
                 rank: int = 2,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):
        # Input Checks
        if not isinstance(D, int):
            raise ValueError("Invalid dimensionality type. Please choose an integer for D.")
        if D < 2:
            raise ValueError("Invalid dimensionality. Please choose D >= 2.")
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select from 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus' or 'relu.")

        # Set the parameters specific to the distribution
        distribution = LowRankMultivariateNormal_Torch
        param_dict = MVN_LoRa.create_param_dict(n_targets=D, rank=rank, response_fn=response_fn)
        distribution_arg_names = ["loc", "cov_factor", "cov_diag"]
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=False,
                         distribution_arg_names=distribution_arg_names,
                         n_targets=D,
                         rank=rank,
                         n_dist_param=len(param_dict),
                         param_dict=param_dict,
                         param_transform=MVN_LoRa.param_transform,
                         get_dist_params=MVN_LoRa.get_dist_params,
                         discrete=False,
                         stabilization=stabilization,
                         loss_fn=loss_fn
                         )

    @staticmethod
    def create_param_dict(n_targets: int,
                          rank: int,
                          response_fn: Callable
                          ) -> Dict:
        """ Function that transforms the distributional parameters to the desired scale.

        Arguments
        ---------
        n_targets: int
            Number of targets.
        rank: int
            Rank of the low-rank form of the covariance matrix.
        response_fn: Callable
            Response function.

        Returns
        -------
        param_dict: Dict
            Dictionary of distributional parameters.
        """
        # Location
        param_dict = {"location_" + str(i + 1): identity_fn for i in range(n_targets)}

        # Low Rank Factor
        cov_factor_dict = {"cov_factor_" + str(i + 1): identity_fn for i in range(n_targets * rank)}
        param_dict.update(cov_factor_dict)

        # Low Rank Diagonal
        cov_diag_dict = {"cov_diag_" + str(i + 1): response_fn for i in range(n_targets)}
        param_dict.update(cov_diag_dict)

        return param_dict

    @staticmethod
    def param_transform(params: List[torch.Tensor],
                        param_dict: Dict,
                        n_targets: int,
                        rank: int,
                        n_obs: Optional[int],
                        ) -> List[torch.Tensor]:
        """ Function that returns a list of parameters for a multivariate normal distribution, parameterized
        by a covariance matrix having a low-rank form parameterized by `cov_factor` and `cov_diag`:

        `covariance_matrix = cov_factor @ cov_factor.T + cov_diag`

        Arguments
        ---------
        params: List[torch.Tensor]
            List of distributional parameters.
        param_dict: Dict
        n_targets: int
            Number of targets.
        rank: int
            Rank of the low-rank form of the covariance matrix.
        n_obs: Optional[int],
            Number of observations.

        Returns
        -------
        params: List[torch.Tensor]
            List of parameters.
        """
        # Transform Parameters to respective scale
        n_params = len(params)
        params = [
            response_fun(params[i].reshape(-1, 1)) for i, (dist_param, response_fun) in enumerate(param_dict.items())
        ]

        # Location
        loc = torch.cat(params[:n_targets], axis=1)

        # Low Rank Factor
        cov_factor = torch.cat(
            params[n_targets:(n_params - n_targets)], axis=1
        ).reshape(-1, n_targets, rank)

        # Low Rank Diagonal
        cov_diag = torch.cat(params[-n_targets:], axis=1)

        params = [loc, cov_factor, cov_diag]

        return params

    @staticmethod
    def get_dist_params(n_targets: int,
                        dist_pred: torch.distributions.Distribution,
                        ) -> pd.DataFrame:
        """
        Function that returns the predicted distributional parameters.

        Arguments
        ---------
        n_targets: int
            Number of targets.
        dist_pred: torch.distributions.Distribution
            Predicted distribution.

        Returns
        -------
        dist_params_df: pd.DataFrame
            DataFrame with predicted distributional parameters.
        """

        # Location
        location_df = pd.DataFrame(dist_pred.loc.numpy())
        location_df.columns = [f"location_{i + 1}" for i in range(n_targets)]

        # Sigma
        scale_df = pd.DataFrame(dist_pred.stddev.detach().numpy())
        scale_df.columns = [f"scale_{i + 1}" for i in range(n_targets)]

        # Rho
        n_obs = location_df.shape[0]
        n_rho = int((n_targets * (n_targets - 1)) / 2)
        cov_mat = dist_pred.covariance_matrix
        rho_df = pd.DataFrame(
            np.concatenate(
                [MVN_LoRa.covariance_to_correlation(cov_mat[i]).reshape(-1, n_rho) for i in range(n_obs)],
                axis=0)
        )
        rho_idx = list(combinations(range(1, n_targets + 1), 2))
        rho_df.columns = [f"rho_{''.join(map(str, rho_idx[i]))}" for i in range(rho_df.shape[1])]

        # Concatenate
        dist_params_df = pd.concat([location_df, scale_df, rho_df], axis=1)

        return dist_params_df

    @staticmethod
    def covariance_to_correlation(cov_mat: torch.Tensor) -> np.ndarray:
        """ Function that calculates the correlation matrix from the covariance matrix.

        Arguments
        ---------
        cov_mat: torch.Tensor
            Covariance matrix.

        Returns
        -------
        cor_mat: np.ndarray
            Correlation matrix.
        """
        cov_mat = np.array(cov_mat)
        diag = np.sqrt(np.diag(np.diag(cov_mat)))
        diag_inv = np.linalg.inv(diag)
        cor_mat = diag_inv @ cov_mat @ diag_inv
        cor_mat = cor_mat[np.tril_indices_from(cor_mat, k=-1)]

        return cor_mat
covariance_to_correlation(cov_mat) staticmethod

Function that calculates the correlation matrix from the covariance matrix.

Arguments

cov_mat: torch.Tensor Covariance matrix.

Returns

cor_mat: np.ndarray Correlation matrix.

Source code in xgboostlss/distributions/MVN_LoRa.py
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
@staticmethod
def covariance_to_correlation(cov_mat: torch.Tensor) -> np.ndarray:
    """ Function that calculates the correlation matrix from the covariance matrix.

    Arguments
    ---------
    cov_mat: torch.Tensor
        Covariance matrix.

    Returns
    -------
    cor_mat: np.ndarray
        Correlation matrix.
    """
    cov_mat = np.array(cov_mat)
    diag = np.sqrt(np.diag(np.diag(cov_mat)))
    diag_inv = np.linalg.inv(diag)
    cor_mat = diag_inv @ cov_mat @ diag_inv
    cor_mat = cor_mat[np.tril_indices_from(cor_mat, k=-1)]

    return cor_mat
create_param_dict(n_targets, rank, response_fn) staticmethod

Function that transforms the distributional parameters to the desired scale.

Arguments

n_targets: int Number of targets. rank: int Rank of the low-rank form of the covariance matrix. response_fn: Callable Response function.

Returns

param_dict: Dict Dictionary of distributional parameters.

Source code in xgboostlss/distributions/MVN_LoRa.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
@staticmethod
def create_param_dict(n_targets: int,
                      rank: int,
                      response_fn: Callable
                      ) -> Dict:
    """ Function that transforms the distributional parameters to the desired scale.

    Arguments
    ---------
    n_targets: int
        Number of targets.
    rank: int
        Rank of the low-rank form of the covariance matrix.
    response_fn: Callable
        Response function.

    Returns
    -------
    param_dict: Dict
        Dictionary of distributional parameters.
    """
    # Location
    param_dict = {"location_" + str(i + 1): identity_fn for i in range(n_targets)}

    # Low Rank Factor
    cov_factor_dict = {"cov_factor_" + str(i + 1): identity_fn for i in range(n_targets * rank)}
    param_dict.update(cov_factor_dict)

    # Low Rank Diagonal
    cov_diag_dict = {"cov_diag_" + str(i + 1): response_fn for i in range(n_targets)}
    param_dict.update(cov_diag_dict)

    return param_dict
get_dist_params(n_targets, dist_pred) staticmethod

Function that returns the predicted distributional parameters.

Arguments

n_targets: int Number of targets. dist_pred: torch.distributions.Distribution Predicted distribution.

Returns

dist_params_df: pd.DataFrame DataFrame with predicted distributional parameters.

Source code in xgboostlss/distributions/MVN_LoRa.py
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
@staticmethod
def get_dist_params(n_targets: int,
                    dist_pred: torch.distributions.Distribution,
                    ) -> pd.DataFrame:
    """
    Function that returns the predicted distributional parameters.

    Arguments
    ---------
    n_targets: int
        Number of targets.
    dist_pred: torch.distributions.Distribution
        Predicted distribution.

    Returns
    -------
    dist_params_df: pd.DataFrame
        DataFrame with predicted distributional parameters.
    """

    # Location
    location_df = pd.DataFrame(dist_pred.loc.numpy())
    location_df.columns = [f"location_{i + 1}" for i in range(n_targets)]

    # Sigma
    scale_df = pd.DataFrame(dist_pred.stddev.detach().numpy())
    scale_df.columns = [f"scale_{i + 1}" for i in range(n_targets)]

    # Rho
    n_obs = location_df.shape[0]
    n_rho = int((n_targets * (n_targets - 1)) / 2)
    cov_mat = dist_pred.covariance_matrix
    rho_df = pd.DataFrame(
        np.concatenate(
            [MVN_LoRa.covariance_to_correlation(cov_mat[i]).reshape(-1, n_rho) for i in range(n_obs)],
            axis=0)
    )
    rho_idx = list(combinations(range(1, n_targets + 1), 2))
    rho_df.columns = [f"rho_{''.join(map(str, rho_idx[i]))}" for i in range(rho_df.shape[1])]

    # Concatenate
    dist_params_df = pd.concat([location_df, scale_df, rho_df], axis=1)

    return dist_params_df
param_transform(params, param_dict, n_targets, rank, n_obs) staticmethod

Function that returns a list of parameters for a multivariate normal distribution, parameterized by a covariance matrix having a low-rank form parameterized by cov_factor and cov_diag:

covariance_matrix = cov_factor @ cov_factor.T + cov_diag

Arguments

params: List[torch.Tensor] List of distributional parameters. param_dict: Dict n_targets: int Number of targets. rank: int Rank of the low-rank form of the covariance matrix. n_obs: Optional[int], Number of observations.

Returns

params: List[torch.Tensor] List of parameters.

Source code in xgboostlss/distributions/MVN_LoRa.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
@staticmethod
def param_transform(params: List[torch.Tensor],
                    param_dict: Dict,
                    n_targets: int,
                    rank: int,
                    n_obs: Optional[int],
                    ) -> List[torch.Tensor]:
    """ Function that returns a list of parameters for a multivariate normal distribution, parameterized
    by a covariance matrix having a low-rank form parameterized by `cov_factor` and `cov_diag`:

    `covariance_matrix = cov_factor @ cov_factor.T + cov_diag`

    Arguments
    ---------
    params: List[torch.Tensor]
        List of distributional parameters.
    param_dict: Dict
    n_targets: int
        Number of targets.
    rank: int
        Rank of the low-rank form of the covariance matrix.
    n_obs: Optional[int],
        Number of observations.

    Returns
    -------
    params: List[torch.Tensor]
        List of parameters.
    """
    # Transform Parameters to respective scale
    n_params = len(params)
    params = [
        response_fun(params[i].reshape(-1, 1)) for i, (dist_param, response_fun) in enumerate(param_dict.items())
    ]

    # Location
    loc = torch.cat(params[:n_targets], axis=1)

    # Low Rank Factor
    cov_factor = torch.cat(
        params[n_targets:(n_params - n_targets)], axis=1
    ).reshape(-1, n_targets, rank)

    # Low Rank Diagonal
    cov_diag = torch.cat(params[-n_targets:], axis=1)

    params = [loc, cov_factor, cov_diag]

    return params

MVT

MVT

Bases: Multivariate_DistributionClass

Multivariate Student-T distribution class.

The multivariate Student-T distribution is parameterized by a degree of freedom df vector, a mean vector and a lower-triangular matrix L with positive-valued diagonal entries, such that Σ=LL'. This triangular matrix can be obtained via, e.g., a Cholesky decomposition of the covariance.

Distributional Parameters

df: torch.Tensor Degrees of freedom. loc: torch.Tensor Mean of the distribution (often referred to as mu). scale_tril: torch.Tensor Lower-triangular factor of covariance, with positive-valued diagonal.

Source

https://docs.pyro.ai/en/stable/distributions.html#multivariatestudentt

Parameters

D: int Number of targets. stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/MVT.py
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
class MVT(Multivariate_DistributionClass):
    """
    Multivariate Student-T distribution class.

    The multivariate Student-T distribution is parameterized by a degree of freedom df vector, a mean vector and a
    lower-triangular matrix L with positive-valued diagonal entries, such that Σ=LL'. This triangular matrix can be
    obtained via, e.g., a Cholesky decomposition of the covariance.

    Distributional Parameters
    -------------------------
    df: torch.Tensor
        Degrees of freedom.
    loc: torch.Tensor
        Mean of the distribution (often referred to as mu).
    scale_tril: torch.Tensor
        Lower-triangular factor of covariance, with positive-valued diagonal.

    Source
    -------------------------
    https://docs.pyro.ai/en/stable/distributions.html#multivariatestudentt

    Parameters
    -------------------------
    D: int
        Number of targets.
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 D: int = 2,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):
        # Input Checks
        if not isinstance(D, int):
            raise ValueError("Invalid dimensionality type. Please choose an integer for D.")
        if D < 2:
            raise ValueError("Invalid dimensionality. Please choose D >= 2.")
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select from 'nll'.")

        # Specify Response Functions
        response_functions = {
            "exp": (exp_fn, exp_fn_df),
            "softplus": (softplus_fn, softplus_fn_df)
        }
        if response_fn in response_functions:
            response_fn, response_fn_df = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus' or 'relu.")

        # Set the parameters specific to the distribution
        distribution = MultivariateStudentT_Torch
        param_dict = MVT.create_param_dict(n_targets=D, response_fn=response_fn, response_fn_df=response_fn_df)
        distribution_arg_names = ["df", "loc", "scale_tril"]
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=False,
                         distribution_arg_names=distribution_arg_names,
                         n_targets=D,
                         n_dist_param=len(param_dict),
                         param_dict=param_dict,
                         param_transform=MVT.param_transform,
                         get_dist_params=MVT.get_dist_params,
                         discrete=False,
                         stabilization=stabilization,
                         loss_fn=loss_fn
                         )

    @staticmethod
    def create_param_dict(n_targets: int,
                          response_fn: Callable,
                          response_fn_df: Callable
                          ) -> Dict:
        """ Function that transforms the distributional parameters to the desired scale.

        Arguments
        ---------
        n_targets: int
            Number of targets.
        response_fn: Callable
            Response function.
        response_fn_df: Callable
            Response function for the degrees of freedom.

        Returns
        -------
        param_dict: Dict
            Dictionary of distributional parameters.
        """

        # Df
        param_dict = {"df": response_fn_df}

        # Location
        loc_dict = {"location_" + str(i + 1): identity_fn for i in range(n_targets)}
        param_dict.update(loc_dict)

        # Tril
        tril_indices = torch.tril_indices(row=n_targets, col=n_targets, offset=0)
        tril_idx = (tril_indices.detach().numpy()) + 1
        n_tril = int((n_targets * (n_targets + 1)) / 2)
        tril_diag = tril_idx[0] == tril_idx[1]

        tril_dict = {}

        for i in range(n_tril):
            if tril_diag[i]:
                tril_dict.update({"scale_tril_diag_" + str(tril_idx[:, i][1]): response_fn})
            else:
                tril_dict.update({"scale_tril_offdiag_" + str(tril_idx[:, i][1]) + str(tril_idx[:, i][0]): identity_fn})

        param_dict.update(tril_dict)

        return param_dict

    @staticmethod
    def param_transform(params: List[torch.Tensor],
                        param_dict: Dict,
                        n_targets: int,
                        rank: Optional[int],
                        n_obs: int,
                        ) -> List[torch.Tensor]:
        """ Function that returns a list of parameters for a multivariate Student-T, parameterized
        by a location vector and the lower triangular matrix of the covariance matrix (Cholesky).

        Arguments
        ---------
        params: List[torch.Tensor]
            List of distributional parameters.
        param_dict: Dict
        n_targets: int
            Number of targets.
        rank: Optional[int]
            Rank of the low-rank form of the covariance matrix.
        n_obs: int
            Number of observations.

        Returns
        -------
        params: List[torch.Tensor]
            List of parameters.
        """
        # Transform Parameters to respective scale
        params = [
            response_fun(params[i].reshape(-1, 1)) for i, (dist_param, response_fun) in enumerate(param_dict.items())
        ]

        # Df
        df = params[0].reshape(-1, )

        # Location
        loc = torch.cat(params[1:(n_targets + 1)], axis=1)

        # Scale Tril
        tril_predt = torch.cat(params[(n_targets + 1):], axis=1)
        tril_indices = torch.tril_indices(row=n_targets, col=n_targets, offset=0)
        scale_tril = torch.zeros(n_obs, n_targets, n_targets, dtype=tril_predt.dtype)
        scale_tril[:, tril_indices[0], tril_indices[1]] = tril_predt

        params = [df, loc, scale_tril]

        return params

    @staticmethod
    def get_dist_params(n_targets: int,
                        dist_pred: torch.distributions.Distribution,
                        ) -> pd.DataFrame:
        """
        Function that returns the predicted distributional parameters.

        Arguments
        ---------
        n_targets: int
            Number of targets.
        dist_pred: torch.distributions.Distribution
            Predicted distribution.

        Returns
        -------
        dist_params_df: pd.DataFrame
            DataFrame with predicted distributional parameters.
        """
        # Df
        Df_df = pd.DataFrame(dist_pred.df.detach().numpy())
        Df_df.columns = ["df"]

        # Location
        location_df = pd.DataFrame(dist_pred.loc.numpy())
        location_df.columns = [f"location_{i + 1}" for i in range(n_targets)]

        # Scale
        scale_df = pd.DataFrame(dist_pred.stddev.detach().numpy())
        scale_df.columns = [f"scale_{i + 1}" for i in range(n_targets)]

        # Rho
        n_obs = location_df.shape[0]
        n_rho = int((n_targets * (n_targets - 1)) / 2)
        # The covariance is df / (df - 2) * covariance_matrix
        df = torch.broadcast_to(dist_pred.df.reshape(-1, 1).unsqueeze(-1), dist_pred.covariance_matrix.shape)
        cov_mat = dist_pred.covariance_matrix * (df / (df - 2))
        rho_df = pd.DataFrame(
            np.concatenate([MVT.covariance_to_correlation(cov_mat[i]).reshape(-1, n_rho) for i in range(n_obs)], axis=0)
        )
        rho_idx = list(combinations(range(1, n_targets + 1), 2))
        rho_df.columns = [f"rho_{''.join(map(str, rho_idx[i]))}" for i in range(rho_df.shape[1])]

        # Concatenate
        dist_params_df = pd.concat([Df_df, location_df, scale_df, rho_df], axis=1)

        return dist_params_df

    @staticmethod
    def covariance_to_correlation(cov_mat: torch.Tensor) -> np.ndarray:
        """ Function that calculates the correlation matrix from the covariance matrix.

        Arguments
        ---------
        cov_mat: torch.Tensor
            Covariance matrix.

        Returns
        -------
        cor_mat: np.ndarray
            Correlation matrix.
        """
        cov_mat = np.array(cov_mat)
        diag = np.sqrt(np.diag(np.diag(cov_mat)))
        diag_inv = np.linalg.inv(diag)
        cor_mat = diag_inv @ cov_mat @ diag_inv
        cor_mat = cor_mat[np.tril_indices_from(cor_mat, k=-1)]

        return cor_mat
covariance_to_correlation(cov_mat) staticmethod

Function that calculates the correlation matrix from the covariance matrix.

Arguments

cov_mat: torch.Tensor Covariance matrix.

Returns

cor_mat: np.ndarray Correlation matrix.

Source code in xgboostlss/distributions/MVT.py
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
@staticmethod
def covariance_to_correlation(cov_mat: torch.Tensor) -> np.ndarray:
    """ Function that calculates the correlation matrix from the covariance matrix.

    Arguments
    ---------
    cov_mat: torch.Tensor
        Covariance matrix.

    Returns
    -------
    cor_mat: np.ndarray
        Correlation matrix.
    """
    cov_mat = np.array(cov_mat)
    diag = np.sqrt(np.diag(np.diag(cov_mat)))
    diag_inv = np.linalg.inv(diag)
    cor_mat = diag_inv @ cov_mat @ diag_inv
    cor_mat = cor_mat[np.tril_indices_from(cor_mat, k=-1)]

    return cor_mat
create_param_dict(n_targets, response_fn, response_fn_df) staticmethod

Function that transforms the distributional parameters to the desired scale.

Arguments

n_targets: int Number of targets. response_fn: Callable Response function. response_fn_df: Callable Response function for the degrees of freedom.

Returns

param_dict: Dict Dictionary of distributional parameters.

Source code in xgboostlss/distributions/MVT.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
@staticmethod
def create_param_dict(n_targets: int,
                      response_fn: Callable,
                      response_fn_df: Callable
                      ) -> Dict:
    """ Function that transforms the distributional parameters to the desired scale.

    Arguments
    ---------
    n_targets: int
        Number of targets.
    response_fn: Callable
        Response function.
    response_fn_df: Callable
        Response function for the degrees of freedom.

    Returns
    -------
    param_dict: Dict
        Dictionary of distributional parameters.
    """

    # Df
    param_dict = {"df": response_fn_df}

    # Location
    loc_dict = {"location_" + str(i + 1): identity_fn for i in range(n_targets)}
    param_dict.update(loc_dict)

    # Tril
    tril_indices = torch.tril_indices(row=n_targets, col=n_targets, offset=0)
    tril_idx = (tril_indices.detach().numpy()) + 1
    n_tril = int((n_targets * (n_targets + 1)) / 2)
    tril_diag = tril_idx[0] == tril_idx[1]

    tril_dict = {}

    for i in range(n_tril):
        if tril_diag[i]:
            tril_dict.update({"scale_tril_diag_" + str(tril_idx[:, i][1]): response_fn})
        else:
            tril_dict.update({"scale_tril_offdiag_" + str(tril_idx[:, i][1]) + str(tril_idx[:, i][0]): identity_fn})

    param_dict.update(tril_dict)

    return param_dict
get_dist_params(n_targets, dist_pred) staticmethod

Function that returns the predicted distributional parameters.

Arguments

n_targets: int Number of targets. dist_pred: torch.distributions.Distribution Predicted distribution.

Returns

dist_params_df: pd.DataFrame DataFrame with predicted distributional parameters.

Source code in xgboostlss/distributions/MVT.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
@staticmethod
def get_dist_params(n_targets: int,
                    dist_pred: torch.distributions.Distribution,
                    ) -> pd.DataFrame:
    """
    Function that returns the predicted distributional parameters.

    Arguments
    ---------
    n_targets: int
        Number of targets.
    dist_pred: torch.distributions.Distribution
        Predicted distribution.

    Returns
    -------
    dist_params_df: pd.DataFrame
        DataFrame with predicted distributional parameters.
    """
    # Df
    Df_df = pd.DataFrame(dist_pred.df.detach().numpy())
    Df_df.columns = ["df"]

    # Location
    location_df = pd.DataFrame(dist_pred.loc.numpy())
    location_df.columns = [f"location_{i + 1}" for i in range(n_targets)]

    # Scale
    scale_df = pd.DataFrame(dist_pred.stddev.detach().numpy())
    scale_df.columns = [f"scale_{i + 1}" for i in range(n_targets)]

    # Rho
    n_obs = location_df.shape[0]
    n_rho = int((n_targets * (n_targets - 1)) / 2)
    # The covariance is df / (df - 2) * covariance_matrix
    df = torch.broadcast_to(dist_pred.df.reshape(-1, 1).unsqueeze(-1), dist_pred.covariance_matrix.shape)
    cov_mat = dist_pred.covariance_matrix * (df / (df - 2))
    rho_df = pd.DataFrame(
        np.concatenate([MVT.covariance_to_correlation(cov_mat[i]).reshape(-1, n_rho) for i in range(n_obs)], axis=0)
    )
    rho_idx = list(combinations(range(1, n_targets + 1), 2))
    rho_df.columns = [f"rho_{''.join(map(str, rho_idx[i]))}" for i in range(rho_df.shape[1])]

    # Concatenate
    dist_params_df = pd.concat([Df_df, location_df, scale_df, rho_df], axis=1)

    return dist_params_df
param_transform(params, param_dict, n_targets, rank, n_obs) staticmethod

Function that returns a list of parameters for a multivariate Student-T, parameterized by a location vector and the lower triangular matrix of the covariance matrix (Cholesky).

Arguments

params: List[torch.Tensor] List of distributional parameters. param_dict: Dict n_targets: int Number of targets. rank: Optional[int] Rank of the low-rank form of the covariance matrix. n_obs: int Number of observations.

Returns

params: List[torch.Tensor] List of parameters.

Source code in xgboostlss/distributions/MVT.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
@staticmethod
def param_transform(params: List[torch.Tensor],
                    param_dict: Dict,
                    n_targets: int,
                    rank: Optional[int],
                    n_obs: int,
                    ) -> List[torch.Tensor]:
    """ Function that returns a list of parameters for a multivariate Student-T, parameterized
    by a location vector and the lower triangular matrix of the covariance matrix (Cholesky).

    Arguments
    ---------
    params: List[torch.Tensor]
        List of distributional parameters.
    param_dict: Dict
    n_targets: int
        Number of targets.
    rank: Optional[int]
        Rank of the low-rank form of the covariance matrix.
    n_obs: int
        Number of observations.

    Returns
    -------
    params: List[torch.Tensor]
        List of parameters.
    """
    # Transform Parameters to respective scale
    params = [
        response_fun(params[i].reshape(-1, 1)) for i, (dist_param, response_fun) in enumerate(param_dict.items())
    ]

    # Df
    df = params[0].reshape(-1, )

    # Location
    loc = torch.cat(params[1:(n_targets + 1)], axis=1)

    # Scale Tril
    tril_predt = torch.cat(params[(n_targets + 1):], axis=1)
    tril_indices = torch.tril_indices(row=n_targets, col=n_targets, offset=0)
    scale_tril = torch.zeros(n_obs, n_targets, n_targets, dtype=tril_predt.dtype)
    scale_tril[:, tril_indices[0], tril_indices[1]] = tril_predt

    params = [df, loc, scale_tril]

    return params

Mixture

Mixture

Bases: MixtureDistributionClass

Mixture-Density distribution class.

Implements a mixture-density distribution for univariate targets, where all components are from different parameterizations of the same distribution-type. A mixture-density distribution is a concept used to model a complex distribution that arises from combining multiple simpler distributions. The Mixture-Density distribution is parameterized by a categorical selecting distribution (over M components) and M-component distributions. For more information on the Mixture-Density distribution, see:

Bishop, C. M. (1994). Mixture density networks. Technical Report NCRG/4288, Aston University, Birmingham, UK.
Distributional Parameters

Inherits the distributional parameters from the component distributions.

Source

https://pytorch.org/docs/stable/distributions.html#mixturesamefamily

Parameters

component_distribution: torch.distributions.Distribution Distribution class for the components of the mixture distribution. Has to be one of the available univariate distributions of the package. M: int Number of components in the mixture distribution. hessian_mode: str Mode for computing the Hessian. Must be one of the following:

    - "individual": Each parameter is treated as a separate tensor. As a result, the Hessian corresponds to the
    second-order derivative with respect to that specific parameter only. The resulting Hessians capture the
    curvature of the loss w.r.t. each individual parameter. This is usually more runtime intensive, but can
    be more accurate.

    - "grouped": Each tensor contains all parameters for a specific parameter-type, e.g., for a Gaussian-Mixture
    with M=2, loc=[loc_1, loc_2], scale=[scale_1, scale_2], and mix_prob=[mix_prob_1, mix_prob_2]. When
    computing the Hessian, the derivatives for all parameters in the respective tensor are calculated jointly.
    The resulting Hessians capture the curvature of the loss w.r.t. the entire parameter-type. This is usually
    less runtime intensive, but can be less accurate.

tau: float, non-negative scalar temperature. The Gumbel-softmax distribution is a continuous distribution over the simplex, which can be thought of as a "soft" version of a categorical distribution. It’s a way to draw samples from a categorical distribution in a differentiable way. The motivation behind using the Gumbel-Softmax is to make the discrete sampling process of categorical variables differentiable, which is useful in gradient-based optimization problems. To sample from a Gumbel-Softmax distribution, one would use the Gumbel-max trick: add a Gumbel noise to logits and apply the softmax. Formally, given a vector z, the Gumbel-softmax function s(z,tau)_i for a component i at temperature tau is defined as:

    s(z,tau)_i = frac{e^{(z_i + g_i) / tau}}{sum_{j=1}^M e^{(z_j + g_j) / tau}}

where g_i is a sample from the Gumbel(0, 1) distribution. The parameter tau (temperature) controls the sharpness
of the output distribution. As tau approaches 0, the mixing probabilities become more discrete, and as tau
approaches infty, the mixing probabilities become more uniform. For more information we refer to

    Jang, E., Gu, Shixiang and Poole, B. "Categorical Reparameterization with Gumbel-Softmax", ICLR, 2017.
Source code in xgboostlss/distributions/Mixture.py
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
class Mixture(MixtureDistributionClass):
    """
    Mixture-Density distribution class.

    Implements a mixture-density distribution for univariate targets, where all components are from different
    parameterizations of the same distribution-type. A mixture-density distribution is a concept used to model a
    complex distribution that arises from combining multiple simpler distributions. The Mixture-Density distribution
    is parameterized by a categorical selecting distribution (over M components) and M-component distributions. For more
    information on the Mixture-Density distribution, see:

        Bishop, C. M. (1994). Mixture density networks. Technical Report NCRG/4288, Aston University, Birmingham, UK.


    Distributional Parameters
    -------------------------
    Inherits the distributional parameters from the component distributions.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#mixturesamefamily

    Parameters
    -------------------------
    component_distribution: torch.distributions.Distribution
        Distribution class for the components of the mixture distribution. Has to be one of the available
        univariate distributions of the package.
    M: int
        Number of components in the mixture distribution.
    hessian_mode: str
        Mode for computing the Hessian. Must be one of the following:

            - "individual": Each parameter is treated as a separate tensor. As a result, the Hessian corresponds to the
            second-order derivative with respect to that specific parameter only. The resulting Hessians capture the
            curvature of the loss w.r.t. each individual parameter. This is usually more runtime intensive, but can
            be more accurate.

            - "grouped": Each tensor contains all parameters for a specific parameter-type, e.g., for a Gaussian-Mixture
            with M=2, loc=[loc_1, loc_2], scale=[scale_1, scale_2], and mix_prob=[mix_prob_1, mix_prob_2]. When
            computing the Hessian, the derivatives for all parameters in the respective tensor are calculated jointly.
            The resulting Hessians capture the curvature of the loss w.r.t. the entire parameter-type. This is usually
            less runtime intensive, but can be less accurate.
    tau: float, non-negative scalar temperature.
        The Gumbel-softmax distribution is a continuous distribution over the simplex, which can be thought of as a "soft"
        version of a categorical distribution. It’s a way to draw samples from a categorical distribution in a
        differentiable way. The motivation behind using the Gumbel-Softmax is to make the discrete sampling process of
        categorical variables differentiable, which is useful in gradient-based optimization problems. To sample from a
        Gumbel-Softmax distribution, one would use the Gumbel-max trick: add a Gumbel noise to logits and apply the softmax.
        Formally, given a vector z, the Gumbel-softmax function s(z,tau)_i for a component i at temperature tau is
        defined as:

            s(z,tau)_i = frac{e^{(z_i + g_i) / tau}}{sum_{j=1}^M e^{(z_j + g_j) / tau}}

        where g_i is a sample from the Gumbel(0, 1) distribution. The parameter tau (temperature) controls the sharpness
        of the output distribution. As tau approaches 0, the mixing probabilities become more discrete, and as tau
        approaches infty, the mixing probabilities become more uniform. For more information we refer to

            Jang, E., Gu, Shixiang and Poole, B. "Categorical Reparameterization with Gumbel-Softmax", ICLR, 2017.
    """
    def __init__(self,
                 component_distribution: torch.distributions.Distribution,
                 M: int = 2,
                 hessian_mode: str = "individual",
                 tau: float = 1.0
                 ):

        # Input Checks
        mixt_dist = get_component_distributions()
        if str(component_distribution.__class__).split(".")[-2] not in mixt_dist:
            raise ValueError(f"component_distribution must be one of the following: {mixt_dist}.")
        if not isinstance(M, int):
            raise ValueError("M must be an integer.")
        if M < 2:
            raise ValueError("M must be greater than 1.")
        if component_distribution.loss_fn != "nll":
            raise ValueError("Loss for component_distribution must be 'nll'.")
        if not isinstance(hessian_mode, str):
            raise ValueError("hessian_mode must be a string.")
        if hessian_mode not in ["individual", "grouped"]:
            raise ValueError("hessian_mode must be either 'individual' or 'grouped'.")
        if not isinstance(tau, float):
            raise ValueError("tau must be a float.")
        if tau <= 0:
            raise ValueError("tau must be greater than 0.")

        # Set the parameters specific to the distribution
        param_dict = component_distribution.param_dict
        preset_gumbel_fn = partial(gumbel_softmax_fn, tau=tau)
        param_dict.update({"mix_prob": preset_gumbel_fn})
        distribution_arg_names = [f"{key}_{i}" for key in param_dict for i in range(1, M + 1)]
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=component_distribution,
                         M=M,
                         temperature=tau,
                         hessian_mode=hessian_mode,
                         univariate=True,
                         discrete=component_distribution.discrete,
                         n_dist_param=len(distribution_arg_names),
                         stabilization=component_distribution.stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=distribution_arg_names,
                         loss_fn=component_distribution.loss_fn
                         )

NegativeBinomial

NegativeBinomial

Bases: DistributionClass

NegativeBinomial distribution class.

Distributional Parameters

total_count: torch.Tensor Non-negative number of negative Bernoulli trials to stop. probs: torch.Tensor Event probabilities of success in the half open interval [0, 1). logits: torch.Tensor Event log-odds for probabilities of success.

Source

https://pytorch.org/docs/stable/distributions.html#negativebinomial

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn_total_count: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). response_fn_probs: str Response function for transforming the distributional parameters to the correct support. Options are "sigmoid" (sigmoid). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/NegativeBinomial.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class NegativeBinomial(DistributionClass):
    """
    NegativeBinomial distribution class.

    Distributional Parameters
    -------------------------
    total_count: torch.Tensor
        Non-negative number of negative Bernoulli trials to stop.
    probs: torch.Tensor
        Event probabilities of success in the half open interval [0, 1).
    logits: torch.Tensor
        Event log-odds for probabilities of success.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#negativebinomial

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn_total_count: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    response_fn_probs: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "sigmoid" (sigmoid).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn_total_count: str = "relu",
                 response_fn_probs: str = "sigmoid",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        #  Specify Response Functions for total_count
        response_functions_total_count = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn_total_count in response_functions_total_count:
            response_fn_total_count = response_functions_total_count[response_fn_total_count]
        else:
            raise ValueError(
                "Invalid response function for total_count. Please choose from 'exp', 'softplus' or 'relu'.")

        #  Specify Response Functions for probs
        response_functions_probs = {"sigmoid": sigmoid_fn}
        if response_fn_probs in response_functions_probs:
            response_fn_probs = response_functions_probs[response_fn_probs]
        else:
            raise ValueError(
                "Invalid response function for probs. Please select 'sigmoid'.")

        # Set the parameters specific to the distribution
        distribution = NegativeBinomial_Torch
        param_dict = {"total_count": response_fn_total_count, "probs": response_fn_probs}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=True,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Poisson

Poisson

Bases: DistributionClass

Poisson distribution class.

Distributional Parameters

rate: torch.Tensor Rate parameter of the distribution (often referred to as lambda).

Source

https://pytorch.org/docs/stable/distributions.html#poisson

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/Poisson.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class Poisson(DistributionClass):
    """
    Poisson distribution class.

    Distributional Parameters
    -------------------------
    rate: torch.Tensor
        Rate parameter of the distribution (often referred to as lambda).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#poisson

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "relu",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function for total_count. Please choose from 'exp', 'softplus' or 'relu'.")

        # Set the parameters specific to the distribution
        distribution = Poisson_Torch
        param_dict = {"rate": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=True,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

SplineFlow

SplineFlow

Bases: NormalizingFlowClass

Spline Flow class.

The spline flow is a normalizing flow based on element-wise rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments that are the ratio of two polynomials. Rational splines offer an excellent combination of functional flexibility whilst maintaining a numerically stable inverse.

For more details, see: - Durkan, C., Bekasov, A., Murray, I. and Papamakarios, G. Neural Spline Flows. NeurIPS 2019. - Dolatabadi, H. M., Erfani, S. and Leckie, C., Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.

Source

https://docs.pyro.ai/en/stable/distributions.html#pyro.distributions.transforms.Spline

Arguments

target_support: str The target support. Options are - "real": [-inf, inf] - "positive": [0, inf] - "positive_integer": [0, 1, 2, 3, ...] - "unit_interval": [0, 1] count_bins: int The number of segments comprising the spline. bound: float The quantity "K" determining the bounding box, [-K,K] x [-K,K] of the spline. By adjusting the "K" value, you can control the size of the bounding box and consequently control the range of inputs that the spline transform operates on. Larger values of "K" will result in a wider valid range for the spline transformation, while smaller values will restrict the valid range to a smaller region. Should be chosen based on the range of the data. order: str The order of the spline. Options are "linear" or "quadratic". stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD" or "L2". loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/SplineFlow.py
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
class SplineFlow(NormalizingFlowClass):
    """
    Spline Flow class.

    The spline flow is a normalizing flow based on element-wise rational spline bijections of linear and quadratic
    order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments
    that are the ratio of two polynomials. Rational splines offer an excellent combination of functional flexibility
    whilst maintaining a numerically stable inverse.

    For more details, see:
    - Durkan, C., Bekasov, A., Murray, I. and Papamakarios, G. Neural Spline Flows. NeurIPS 2019.
    - Dolatabadi, H. M., Erfani, S. and Leckie, C., Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.


    Source
    ---------
    https://docs.pyro.ai/en/stable/distributions.html#pyro.distributions.transforms.Spline


    Arguments
    ---------
    target_support: str
        The target support. Options are
            - "real": [-inf, inf]
            - "positive": [0, inf]
            - "positive_integer": [0, 1, 2, 3, ...]
            - "unit_interval": [0, 1]
    count_bins: int
        The number of segments comprising the spline.
    bound: float
        The quantity "K" determining the bounding box, [-K,K] x [-K,K] of the spline. By adjusting the
        "K" value, you can control the size of the bounding box and consequently control the range of inputs that
        the spline transform operates on. Larger values of "K" will result in a wider valid range for the spline
        transformation, while smaller values will restrict the valid range to a smaller region. Should be chosen
        based on the range of the data.
    order: str
        The order of the spline. Options are "linear" or "quadratic".
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD" or "L2".
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 target_support: str = "real",
                 count_bins: int = 8,
                 bound: float = 3.0,
                 order: str = "linear",
                 stabilization: str = "None",
                 loss_fn: str = "nll"
                 ):

        # Specify Target Transform
        if not isinstance(target_support, str):
            raise ValueError("target_support must be a string.")

        transforms = {
            "real": (identity_transform, False),
            "positive": (SoftplusTransform(), False),
            "positive_integer": (SoftplusTransform(), True),
            "unit_interval": (SigmoidTransform(), False)
        }

        if target_support in transforms:
            target_transform, discrete = transforms[target_support]
        else:
            raise ValueError(
                "Invalid target_support. Options are 'real', 'positive', 'positive_integer', or 'unit_interval'.")

        # Check if count_bins is valid
        if not isinstance(count_bins, int):
            raise ValueError("count_bins must be an integer.")
        if count_bins <= 0:
            raise ValueError("count_bins must be a positive integer > 0.")

        # Check if bound is float
        if not isinstance(bound, float):
            raise ValueError("bound must be a float.")

        # Number of parameters
        if not isinstance(order, str):
            raise ValueError("order must be a string.")

        order_params = {
            "quadratic": 2 * count_bins + (count_bins - 1),
            "linear": 3 * count_bins + (count_bins - 1)
        }

        if order in order_params:
            n_params = order_params[order]
        else:
            raise ValueError("Invalid order specification. Options are 'linear' or 'quadratic'.")

        # Check if stabilization method is valid.
        if not isinstance(stabilization, str):
            raise ValueError("stabilization must be a string.")
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Options are 'None', 'MAD' or 'L2'.")

        # Check if loss function is valid.
        if not isinstance(loss_fn, str):
            raise ValueError("loss_fn must be a string.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss_fn. Options are 'nll' or 'crps'.")

        # Specify parameter dictionary
        param_dict = {f"param_{i + 1}": identity_fn for i in range(n_params)}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Normalizing Flow Class
        super().__init__(base_dist=Normal,                     # Base distribution, currently only Normal is supported.
                         flow_transform=Spline,
                         count_bins=count_bins,
                         bound=bound,
                         order=order,
                         n_dist_param=n_params,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         target_transform=target_transform,
                         discrete=discrete,
                         univariate=True,
                         stabilization=stabilization,
                         loss_fn=loss_fn
                         )

StudentT

StudentT

Bases: DistributionClass

Student-T Distribution Class

Distributional Parameters

df: torch.Tensor Degrees of freedom. loc: torch.Tensor Mean of the distribution. scale: torch.Tensor Scale of the distribution.

Source

https://pytorch.org/docs/stable/distributions.html#studentt

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/StudentT.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class StudentT(DistributionClass):
    """
    Student-T Distribution Class

    Distributional Parameters
    -------------------------
    df: torch.Tensor
        Degrees of freedom.
    loc: torch.Tensor
        Mean of the distribution.
    scale: torch.Tensor
        Scale of the distribution.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#studentt

     Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {
            "exp": (exp_fn, exp_fn_df),
            "softplus": (softplus_fn, softplus_fn_df)
        }
        if response_fn in response_functions:
            response_fn, response_fn_df = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = StudentT_Torch
        param_dict = {"df": response_fn_df, "loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Weibull

Weibull

Bases: DistributionClass

Weibull distribution class.

Distributional Parameters

scale: torch.Tensor Scale parameter of distribution (lambda). concentration: torch.Tensor Concentration parameter of distribution (k/shape).

Source

https://pytorch.org/docs/stable/distributions.html#weibull

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/Weibull.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Weibull(DistributionClass):
    """
    Weibull distribution class.

    Distributional Parameters
    -------------------------
    scale: torch.Tensor
        Scale parameter of distribution (lambda).
    concentration: torch.Tensor
        Concentration parameter of distribution (k/shape).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#weibull

     Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Weibull_Torch
        param_dict = {"scale": response_fn, "concentration": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZABeta

ZABeta

Bases: DistributionClass

Zero-Adjusted Beta distribution class.

The zero-adjusted Beta distribution is similar to the Beta distribution but allows zeros as y values.

Distributional Parameters

concentration1: torch.Tensor 1st concentration parameter of the distribution (often referred to as alpha). concentration0: torch.Tensor 2nd concentration parameter of the distribution (often referred to as beta). gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/ZABeta.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class ZABeta(DistributionClass):
    """
    Zero-Adjusted Beta distribution class.

    The zero-adjusted Beta distribution is similar to the Beta distribution but allows zeros as y values.

    Distributional Parameters
    -------------------------
    concentration1: torch.Tensor
        1st concentration parameter of the distribution (often referred to as alpha).
    concentration0: torch.Tensor
        2nd concentration parameter of the distribution (often referred to as beta).
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = ZeroAdjustedBeta_Torch
        param_dict = {"concentration1": response_fn, "concentration0": response_fn, "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZAGamma

ZAGamma

Bases: DistributionClass

Zero-Adjusted Gamma distribution class.

The zero-adjusted Gamma distribution is similar to the Gamma distribution but allows zeros as y values.

Distributional Parameters

concentration: torch.Tensor shape parameter of the distribution (often referred to as alpha) rate: torch.Tensor rate = 1 / scale of the distribution (often referred to as beta) gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/ZAGamma.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class ZAGamma(DistributionClass):
    """
    Zero-Adjusted Gamma distribution class.

    The zero-adjusted Gamma distribution is similar to the Gamma distribution but allows zeros as y values.

     Distributional Parameters
    --------------------------
    concentration: torch.Tensor
        shape parameter of the distribution (often referred to as alpha)
    rate: torch.Tensor
        rate = 1 / scale of the distribution (often referred to as beta)
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = ZeroAdjustedGamma_Torch
        param_dict = {"concentration": response_fn, "rate": response_fn, "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZALN

ZALN

Bases: DistributionClass

Zero-Adjusted LogNormal distribution class.

The zero-adjusted Log-Normal distribution is similar to the Log-Normal distribution but allows zeros as y values.

Distributional Parameters

loc: torch.Tensor Mean of log of distribution. scale: torch.Tensor Standard deviation of log of the distribution. gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/ZALN.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class ZALN(DistributionClass):
    """
    Zero-Adjusted LogNormal distribution class.

    The zero-adjusted Log-Normal distribution is similar to the Log-Normal distribution but allows zeros as y values.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of log of distribution.
    scale: torch.Tensor
        Standard deviation of log of the distribution.
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = ZeroAdjustedLogNormal_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn,  "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZINB

ZINB

Bases: DistributionClass

Zero-Inflated Negative Binomial distribution class.

Distributional Parameters

total_count: torch.Tensor Non-negative number of negative Bernoulli trials to stop. probs: torch.Tensor Event probabilities of success in the half open interval [0, 1). gate: torch.Tensor Probability of extra zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn_total_count: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). response_fn_probs: str Response function for transforming the distributional parameters to the correct support. Options are "sigmoid" (sigmoid). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/ZINB.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class ZINB(DistributionClass):
    """
    Zero-Inflated Negative Binomial distribution class.

    Distributional Parameters
    -------------------------
    total_count: torch.Tensor
        Non-negative number of negative Bernoulli trials to stop.
    probs: torch.Tensor
        Event probabilities of success in the half open interval [0, 1).
    gate: torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn_total_count: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    response_fn_probs: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "sigmoid" (sigmoid).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn_total_count: str = "relu",
                 response_fn_probs: str = "sigmoid",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        #  Specify Response Functions for total_count
        response_functions_total_count = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn_total_count in response_functions_total_count:
            response_fn_total_count = response_functions_total_count[response_fn_total_count]
        else:
            raise ValueError(
                "Invalid response function for total_count. Please choose from 'exp', 'softplus' or 'relu'.")

        #  Specify Response Functions for probs
        response_functions_probs = {"sigmoid": sigmoid_fn}
        if response_fn_probs in response_functions_probs:
            response_fn_probs = response_functions_probs[response_fn_probs]
        else:
            raise ValueError(
                "Invalid response function for probs. Please select 'sigmoid'.")

        # Set the parameters specific to the distribution
        distribution = ZeroInflatedNegativeBinomial_Torch
        param_dict = {"total_count": response_fn_total_count, "probs": response_fn_probs, "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=True,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZIPoisson

ZIPoisson

Bases: DistributionClass

Zero-Inflated Poisson distribution class.

Distributional Parameters

rate: torch.Tensor Rate parameter of the distribution (often referred to as lambda). gate: torch.Tensor Probability of extra zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/ZIPoisson.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
class ZIPoisson(DistributionClass):
    """
    Zero-Inflated Poisson distribution class.

    Distributional Parameters
    -------------------------
    rate: torch.Tensor
        Rate parameter of the distribution (often referred to as lambda).
    gate: torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "relu",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function for total_count. Please choose from 'exp', 'softplus' or 'relu'.")

        # Set the parameters specific to the distribution
        distribution = ZeroInflatedPoisson_Torch
        param_dict = {"rate": response_fn, "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=True,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

distribution_utils

DistributionClass

Generic class that contains general functions for univariate distributions.

Arguments

distribution: torch.distributions.Distribution PyTorch Distribution class. univariate: bool Whether the distribution is univariate or multivariate. discrete: bool Whether the support of the distribution is discrete or continuous. n_dist_param: int Number of distributional parameters. stabilization: str Stabilization method. param_dict: Dict[str, Any] Dictionary that maps distributional parameters to their response scale. distribution_arg_names: List List of distributional parameter names. loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function. tau: List List of expectiles. Only used for Expectile distributon. penalize_crossing: bool Whether to include a penalty term to discourage crossing of expectiles. Only used for Expectile distribution.

Source code in xgboostlss/distributions/distribution_utils.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
class DistributionClass:
    """
    Generic class that contains general functions for univariate distributions.

    Arguments
    ---------
    distribution: torch.distributions.Distribution
        PyTorch Distribution class.
    univariate: bool
        Whether the distribution is univariate or multivariate.
    discrete: bool
        Whether the support of the distribution is discrete or continuous.
    n_dist_param: int
        Number of distributional parameters.
    stabilization: str
        Stabilization method.
    param_dict: Dict[str, Any]
        Dictionary that maps distributional parameters to their response scale.
    distribution_arg_names: List
        List of distributional parameter names.
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    tau: List
        List of expectiles. Only used for Expectile distributon.
    penalize_crossing: bool
        Whether to include a penalty term to discourage crossing of expectiles. Only used for Expectile distribution.
    """
    def __init__(self,
                 distribution: torch.distributions.Distribution = None,
                 univariate: bool = True,
                 discrete: bool = False,
                 n_dist_param: int = None,
                 stabilization: str = "None",
                 param_dict: Dict[str, Any] = None,
                 distribution_arg_names: List = None,
                 loss_fn: str = "nll",
                 tau: Optional[List[torch.Tensor]] = None,
                 penalize_crossing: bool = False,
                 ):

        self.distribution = distribution
        self.univariate = univariate
        self.discrete = discrete
        self.n_dist_param = n_dist_param
        self.stabilization = stabilization
        self.param_dict = param_dict
        self.distribution_arg_names = distribution_arg_names
        self.loss_fn = loss_fn
        self.tau = tau
        self.penalize_crossing = penalize_crossing

    def objective_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:

        """
        Function to estimate gradients and hessians of distributional parameters.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: xgb.DMatrix
            Data used for training.

        Returns
        -------
        grad: np.ndarray
            Gradient.
        hess: np.ndarray
            Hessian.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1))

        # Weights
        if data.get_weight().size == 0:
            # Use 1 as weight if no weights are specified
            weights = torch.ones_like(target, dtype=target.dtype).numpy()
        else:
            weights = data.get_weight().reshape(-1, 1)

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate gradients and hessians
        predt, loss = self.get_params_loss(predt, target, start_values, requires_grad=True)
        grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

        return grad, hess

    def metric_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[str, np.ndarray]:
        """
        Function that evaluates the predictions using the specified loss function.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: xgb.DMatrix
            Data used for training.

        Returns
        -------
        name: str
            Name of the evaluation metric.
        loss: float
            Loss value.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1))

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate loss
        _, loss = self.get_params_loss(predt, target, start_values, requires_grad=False)

        return self.loss_fn, loss

    def loss_fn_start_values(self,
                             params: torch.Tensor,
                             target: torch.Tensor) -> torch.Tensor:
        """
        Function that calculates the loss for a given set of distributional parameters. Only used for calculating
        the loss for the start values.

        Parameter
        ---------
        params: torch.Tensor
            Distributional parameters.
        target: torch.Tensor
            Target values.

        Returns
        -------
        loss: torch.Tensor
            Loss value.
        """
        # Replace NaNs and infinity values with 0.5
        nan_inf_idx = torch.isnan(torch.stack(params)) | torch.isinf(torch.stack(params))
        params = torch.where(nan_inf_idx, torch.tensor(0.5), torch.stack(params))

        # Transform parameters to response scale
        params = [
            response_fn(params[i].reshape(-1, 1)) for i, response_fn in enumerate(self.param_dict.values())
        ]

        # Specify Distribution and Loss
        if self.tau is None:
            dist = self.distribution(*params)
            loss = -torch.nansum(dist.log_prob(target))
        else:
            dist = self.distribution(params, self.penalize_crossing)
            loss = -torch.nansum(dist.log_prob(target, self.tau))

        return loss

    def calculate_start_values(self,
                               target: np.ndarray,
                               max_iter: int = 50
                               ) -> Tuple[float, np.ndarray]:
        """
        Function that calculates the starting values for each distributional parameter.

        Arguments
        ---------
        target: np.ndarray
            Data from which starting values are calculated.
        max_iter: int
            Maximum number of iterations.

        Returns
        -------
        loss: float
            Loss value.
        start_values: np.ndarray
            Starting values for each distributional parameter.
        """
        # Convert target to torch.tensor
        target = torch.tensor(target).reshape(-1, 1)

        # Initialize parameters
        params = [torch.tensor(0.5, requires_grad=True) for _ in range(self.n_dist_param)]

        # Specify optimizer
        optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter/4), 20]), line_search_fn="strong_wolfe")

        # Define learning rate scheduler
        lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

        # Define closure
        def closure():
            optimizer.zero_grad()
            loss = self.loss_fn_start_values(params, target)
            loss.backward()
            return loss

        # Optimize parameters
        loss_vals = []
        for epoch in range(max_iter):
            loss = optimizer.step(closure)
            lr_scheduler.step(loss)
            loss_vals.append(loss.item())

        # Get final loss
        loss = np.array(loss_vals[-1])

        # Get start values
        start_values = np.array([params[i].detach() for i in range(self.n_dist_param)])

        # Replace any remaining NaNs or infinity values with 0.5
        start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

        return loss, start_values

    def get_params_loss(self,
                        predt: np.ndarray,
                        target: torch.Tensor,
                        start_values: List[float],
                        requires_grad: bool = False,
                        ) -> Tuple[List[torch.Tensor], np.ndarray]:
        """
        Function that returns the predicted parameters and the loss.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        target: torch.Tensor
            Target values.
        start_values: List
            Starting values for each distributional parameter.
        requires_grad: bool
            Whether to add to the computational graph or not.

        Returns
        -------
        predt: List of torch.Tensors
            Predicted parameters.
        loss: torch.Tensor
            Loss value.
        """
        # Predicted Parameters
        predt = predt.reshape(-1, self.n_dist_param)

        # Replace NaNs and infinity values with unconditional start values
        nan_inf_mask = np.isnan(predt) | np.isinf(predt)
        predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

        # Convert to torch.tensor
        predt = [
            torch.tensor(predt[:, i].reshape(-1, 1), requires_grad=requires_grad) for i in range(self.n_dist_param)
        ]

        # Predicted Parameters transformed to response scale
        predt_transformed = [
            response_fn(predt[i].reshape(-1, 1)) for i, response_fn in enumerate(self.param_dict.values())
        ]

        # Specify Distribution and Loss
        if self.tau is None:
            dist_kwargs = dict(zip(self.distribution_arg_names, predt_transformed))
            dist_fit = self.distribution(**dist_kwargs)
            if self.loss_fn == "nll":
                loss = -torch.nansum(dist_fit.log_prob(target))
            elif self.loss_fn == "crps":
                torch.manual_seed(123)
                dist_samples = dist_fit.rsample((30,)).squeeze(-1)
                loss = torch.nansum(self.crps_score(target, dist_samples))
            else:
                raise ValueError("Invalid loss function. Please select 'nll' or 'crps'.")
        else:
            dist_fit = self.distribution(predt_transformed, self.penalize_crossing)
            loss = -torch.nansum(dist_fit.log_prob(target, self.tau))

        return predt, loss

    def draw_samples(self,
                     predt_params: pd.DataFrame,
                     n_samples: int = 1000,
                     seed: int = 123
                     ) -> pd.DataFrame:
        """
        Function that draws n_samples from a predicted distribution.

        Arguments
        ---------
        predt_params: pd.DataFrame
            pd.DataFrame with predicted distributional parameters.
        n_samples: int
            Number of sample to draw from predicted response distribution.
        seed: int
            Manual seed.

        Returns
        -------
        pred_dist: pd.DataFrame
            DataFrame with n_samples drawn from predicted response distribution.

        """
        torch.manual_seed(seed)

        if self.tau is None:
            pred_params = torch.tensor(predt_params.values)
            dist_kwargs = {arg_name: param for arg_name, param in zip(self.distribution_arg_names, pred_params.T)}
            dist_pred = self.distribution(**dist_kwargs)
            dist_samples = dist_pred.sample((n_samples,)).squeeze().detach().numpy().T
            dist_samples = pd.DataFrame(dist_samples)
            dist_samples.columns = [str("y_sample") + str(i) for i in range(dist_samples.shape[1])]
        else:
            dist_samples = None

        if self.discrete:
            dist_samples = dist_samples.astype(int)

        return dist_samples

    def predict_dist(self,
                     booster: xgb.Booster,
                     start_values: np.ndarray,
                     data: xgb.DMatrix,
                     pred_type: str = "parameters",
                     n_samples: int = 1000,
                     quantiles: list = [0.1, 0.5, 0.9],
                     seed: str = 123
                     ) -> pd.DataFrame:
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        booster : xgb.Booster
            Trained model.
        start_values : np.ndarray
            Starting values for each distributional parameter.
        data : xgb.DMatrix
            Data to predict from.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
            - "expectiles" returns the predicted expectiles.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        pred : pd.DataFrame
            Predictions.
        """
        # Set base_margin as starting point for each distributional parameter. Requires base_score=0 in parameters.
        base_margin_test = (np.ones(shape=(data.num_row(), 1))) * start_values
        data.set_base_margin(base_margin_test.flatten())

        predt = np.array(booster.predict(data, output_margin=True)).reshape(-1, self.n_dist_param)
        predt = torch.tensor(predt, dtype=torch.float32)

        # Transform predicted parameters to response scale
        dist_params_predt = np.concatenate(
            [
                response_fun(
                    predt[:, i].reshape(-1, 1)).numpy() for i, (dist_param, response_fun) in
                enumerate(self.param_dict.items())
            ],
            axis=1,
        )
        dist_params_predt = pd.DataFrame(dist_params_predt)
        dist_params_predt.columns = self.param_dict.keys()

        # Draw samples from predicted response distribution
        pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                            n_samples=n_samples,
                                            seed=seed)

        if pred_type == "parameters":
            return dist_params_predt

        elif pred_type == "expectiles":
            return dist_params_predt

        elif pred_type == "samples":
            return pred_samples_df

        elif pred_type == "quantiles":
            # Calculate quantiles from predicted response distribution
            pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
            pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
            if self.discrete:
                pred_quant_df = pred_quant_df.astype(int)
            return pred_quant_df

    def compute_gradients_and_hessians(self,
                                       loss: torch.tensor,
                                       predt: torch.tensor,
                                       weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

        """
        Calculates gradients and hessians.

        Output gradients and hessians have shape (n_samples*n_outputs, 1).

        Arguments:
        ---------
        loss: torch.Tensor
            Loss.
        predt: torch.Tensor
            List of predicted parameters.
        weights: np.ndarray
            Weights.

        Returns:
        -------
        grad: torch.Tensor
            Gradients.
        hess: torch.Tensor
            Hessians.
        """
        if self.loss_fn == "nll":
            # Gradient and Hessian
            grad = autograd(loss, inputs=predt, create_graph=True)
            hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]
        elif self.loss_fn == "crps":
            # Gradient and Hessian
            grad = autograd(loss, inputs=predt, create_graph=True)
            hess = [torch.ones_like(grad[i]) for i in range(len(grad))]

        # Stabilization of Derivatives
        if self.stabilization != "None":
            grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
            hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

        # Reshape
        grad = torch.cat(grad, axis=1).detach().numpy()
        hess = torch.cat(hess, axis=1).detach().numpy()

        # Weighting
        grad *= weights
        hess *= weights

        # Flatten
        grad = grad.flatten()
        hess = hess.flatten()

        return grad, hess

    def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
        """
        Function that stabilizes Gradients and Hessians.

        As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
        that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
        the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
        Another way to improve convergence might be to standardize the response variable. This is especially useful if the
        range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
        the standardization of the response are not always advised but need to be carefully considered.
        Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

        Parameters
        ----------
        input_der : torch.Tensor
            Input derivative, either Gradient or Hessian.
        type: str
            Stabilization method. Can be either "None", "MAD" or "L2".

        Returns
        -------
        stab_der : torch.Tensor
            Stabilized Gradient or Hessian.
        """

        if type == "MAD":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            stab_der = input_der / div

        if type == "L2":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.sqrt(torch.nanmean(input_der.pow(2)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
            stab_der = input_der / div

        if type == "None":
            stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

        return stab_der


    def crps_score(self, y: torch.tensor, yhat_dist: torch.tensor) -> torch.tensor:
        """
        Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

        Parameters
        ----------
        y: torch.Tensor
            Response variable of shape (n_observations,1).
        yhat_dist: torch.Tensor
            Predicted samples of shape (n_samples, n_observations).

        Returns
        -------
        crps: torch.Tensor
            CRPS score.

        References
        ----------
        Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.
        Journal of the American Statistical Association. 102. 359-378.

        Source
        ------
        https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549
        """
        # Get the number of observations
        n_samples = yhat_dist.shape[0]

        # Sort the forecasts in ascending order
        yhat_dist_sorted, _ = torch.sort(yhat_dist, 0)

        # Create temporary tensors
        y_cdf = torch.zeros_like(y)
        yhat_cdf = torch.zeros_like(y)
        yhat_prev = torch.zeros_like(y)
        crps = torch.zeros_like(y)

        # Loop over the predicted samples generated per observation
        for yhat in yhat_dist_sorted:
            yhat = yhat.reshape(-1, 1)
            flag = (y_cdf == 0) * (y < yhat)
            crps += flag * ((y - yhat_prev) * yhat_cdf ** 2)
            crps += flag * ((yhat - y) * (yhat_cdf - 1) ** 2)
            crps += (~flag) * ((yhat - yhat_prev) * (yhat_cdf - y_cdf) ** 2)
            y_cdf += flag
            yhat_cdf += 1 / n_samples
            yhat_prev = yhat

        # In case y_cdf == 0 after the loop
        flag = (y_cdf == 0)
        crps += flag * (y - yhat)

        return crps

    def dist_select(self,
                    target: np.ndarray,
                    candidate_distributions: List,
                    max_iter: int = 100,
                    plot: bool = False,
                    figure_size: tuple = (10, 5),
                    ) -> pd.DataFrame:
        """
        Function that selects the most suitable distribution among the candidate_distributions for the target variable,
        based on the NegLogLikelihood (lower is better).

        Parameters
        ----------
        target: np.ndarray
            Response variable.
        candidate_distributions: List
            List of candidate distributions.
        max_iter: int
            Maximum number of iterations for the optimization.
        plot: bool
            If True, a density plot of the actual and fitted distribution is created.
        figure_size: tuple
            Figure size of the density plot.

        Returns
        -------
        fit_df: pd.DataFrame
            Dataframe with the loss values of the fitted candidate distributions.
        """
        dist_list = []
        total_iterations = len(candidate_distributions)
        with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
            for i in range(len(candidate_distributions)):
                dist_name = candidate_distributions[i].__name__.split(".")[2]
                pbar.set_description(f"Fitting {dist_name} distribution")
                dist_sel = getattr(candidate_distributions[i], dist_name)()
                try:
                    loss, params = dist_sel.calculate_start_values(target=target.reshape(-1, 1), max_iter=max_iter)
                    fit_df = pd.DataFrame.from_dict(
                        {self.loss_fn: loss.reshape(-1,),
                         "distribution": str(dist_name),
                         "params": [params]
                         }
                    )
                except Exception as e:
                    warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                    fit_df = pd.DataFrame(
                        {self.loss_fn: np.nan,
                         "distribution": str(dist_name),
                         "params": [np.nan] * self.n_dist_param
                         }
                    )
                dist_list.append(fit_df)
                pbar.update(1)
            pbar.set_description(f"Fitting of candidate distributions completed")
            fit_df = pd.concat(dist_list).sort_values(by=self.loss_fn, ascending=True)
            fit_df["rank"] = fit_df[self.loss_fn].rank().astype(int)
            fit_df.set_index(fit_df["rank"], inplace=True)
        if plot:
            # Select best distribution
            best_dist = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
            for dist in candidate_distributions:
                if dist.__name__.split(".")[2] == best_dist["distribution"].values[0]:
                    best_dist_sel = dist
                    break
            best_dist_sel = getattr(best_dist_sel, best_dist["distribution"].values[0])()
            params = torch.tensor(best_dist["params"][0]).reshape(-1, best_dist_sel.n_dist_param)

            # Transform parameters to the response scale and draw samples
            fitted_params = np.concatenate(
                [
                    response_fun(params[:, i].reshape(-1, 1)).numpy()
                    for i, (dist_param, response_fun) in enumerate(best_dist_sel.param_dict.items())
                ],
                axis=1,
            )
            fitted_params = pd.DataFrame(fitted_params, columns=best_dist_sel.param_dict.keys())
            n_samples = np.max([10000, target.shape[0]])
            n_samples = np.where(n_samples > 500000, 100000, n_samples)
            dist_samples = best_dist_sel.draw_samples(fitted_params,
                                                      n_samples=n_samples,
                                                      seed=123).values

            # Plot actual and fitted distribution
            plt.figure(figsize=figure_size)
            sns.kdeplot(target.reshape(-1, ), label="Actual")
            sns.kdeplot(dist_samples.reshape(-1, ), label=f"Best-Fit: {best_dist['distribution'].values[0]}")
            plt.legend()
            plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
            plt.show()

        fit_df.drop(columns=["rank", "params"], inplace=True)

        return fit_df
calculate_start_values(target, max_iter=50)

Function that calculates the starting values for each distributional parameter.

Arguments

target: np.ndarray Data from which starting values are calculated. max_iter: int Maximum number of iterations.

Returns

loss: float Loss value. start_values: np.ndarray Starting values for each distributional parameter.

Source code in xgboostlss/distributions/distribution_utils.py
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
def calculate_start_values(self,
                           target: np.ndarray,
                           max_iter: int = 50
                           ) -> Tuple[float, np.ndarray]:
    """
    Function that calculates the starting values for each distributional parameter.

    Arguments
    ---------
    target: np.ndarray
        Data from which starting values are calculated.
    max_iter: int
        Maximum number of iterations.

    Returns
    -------
    loss: float
        Loss value.
    start_values: np.ndarray
        Starting values for each distributional parameter.
    """
    # Convert target to torch.tensor
    target = torch.tensor(target).reshape(-1, 1)

    # Initialize parameters
    params = [torch.tensor(0.5, requires_grad=True) for _ in range(self.n_dist_param)]

    # Specify optimizer
    optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter/4), 20]), line_search_fn="strong_wolfe")

    # Define learning rate scheduler
    lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

    # Define closure
    def closure():
        optimizer.zero_grad()
        loss = self.loss_fn_start_values(params, target)
        loss.backward()
        return loss

    # Optimize parameters
    loss_vals = []
    for epoch in range(max_iter):
        loss = optimizer.step(closure)
        lr_scheduler.step(loss)
        loss_vals.append(loss.item())

    # Get final loss
    loss = np.array(loss_vals[-1])

    # Get start values
    start_values = np.array([params[i].detach() for i in range(self.n_dist_param)])

    # Replace any remaining NaNs or infinity values with 0.5
    start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

    return loss, start_values
compute_gradients_and_hessians(loss, predt, weights)

Calculates gradients and hessians.

Output gradients and hessians have shape (n_samples*n_outputs, 1).

Arguments:

loss: torch.Tensor Loss. predt: torch.Tensor List of predicted parameters. weights: np.ndarray Weights.

Returns:

grad: torch.Tensor Gradients. hess: torch.Tensor Hessians.

Source code in xgboostlss/distributions/distribution_utils.py
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
def compute_gradients_and_hessians(self,
                                   loss: torch.tensor,
                                   predt: torch.tensor,
                                   weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

    """
    Calculates gradients and hessians.

    Output gradients and hessians have shape (n_samples*n_outputs, 1).

    Arguments:
    ---------
    loss: torch.Tensor
        Loss.
    predt: torch.Tensor
        List of predicted parameters.
    weights: np.ndarray
        Weights.

    Returns:
    -------
    grad: torch.Tensor
        Gradients.
    hess: torch.Tensor
        Hessians.
    """
    if self.loss_fn == "nll":
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]
    elif self.loss_fn == "crps":
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [torch.ones_like(grad[i]) for i in range(len(grad))]

    # Stabilization of Derivatives
    if self.stabilization != "None":
        grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
        hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

    # Reshape
    grad = torch.cat(grad, axis=1).detach().numpy()
    hess = torch.cat(hess, axis=1).detach().numpy()

    # Weighting
    grad *= weights
    hess *= weights

    # Flatten
    grad = grad.flatten()
    hess = hess.flatten()

    return grad, hess
crps_score(y, yhat_dist)

Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

Parameters

y: torch.Tensor Response variable of shape (n_observations,1). yhat_dist: torch.Tensor Predicted samples of shape (n_samples, n_observations).

Returns

crps: torch.Tensor CRPS score.

References

Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association. 102. 359-378.

Source

https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549

Source code in xgboostlss/distributions/distribution_utils.py
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
def crps_score(self, y: torch.tensor, yhat_dist: torch.tensor) -> torch.tensor:
    """
    Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

    Parameters
    ----------
    y: torch.Tensor
        Response variable of shape (n_observations,1).
    yhat_dist: torch.Tensor
        Predicted samples of shape (n_samples, n_observations).

    Returns
    -------
    crps: torch.Tensor
        CRPS score.

    References
    ----------
    Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.
    Journal of the American Statistical Association. 102. 359-378.

    Source
    ------
    https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549
    """
    # Get the number of observations
    n_samples = yhat_dist.shape[0]

    # Sort the forecasts in ascending order
    yhat_dist_sorted, _ = torch.sort(yhat_dist, 0)

    # Create temporary tensors
    y_cdf = torch.zeros_like(y)
    yhat_cdf = torch.zeros_like(y)
    yhat_prev = torch.zeros_like(y)
    crps = torch.zeros_like(y)

    # Loop over the predicted samples generated per observation
    for yhat in yhat_dist_sorted:
        yhat = yhat.reshape(-1, 1)
        flag = (y_cdf == 0) * (y < yhat)
        crps += flag * ((y - yhat_prev) * yhat_cdf ** 2)
        crps += flag * ((yhat - y) * (yhat_cdf - 1) ** 2)
        crps += (~flag) * ((yhat - yhat_prev) * (yhat_cdf - y_cdf) ** 2)
        y_cdf += flag
        yhat_cdf += 1 / n_samples
        yhat_prev = yhat

    # In case y_cdf == 0 after the loop
    flag = (y_cdf == 0)
    crps += flag * (y - yhat)

    return crps
dist_select(target, candidate_distributions, max_iter=100, plot=False, figure_size=(10, 5))

Function that selects the most suitable distribution among the candidate_distributions for the target variable, based on the NegLogLikelihood (lower is better).

Parameters

target: np.ndarray Response variable. candidate_distributions: List List of candidate distributions. max_iter: int Maximum number of iterations for the optimization. plot: bool If True, a density plot of the actual and fitted distribution is created. figure_size: tuple Figure size of the density plot.

Returns

fit_df: pd.DataFrame Dataframe with the loss values of the fitted candidate distributions.

Source code in xgboostlss/distributions/distribution_utils.py
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
def dist_select(self,
                target: np.ndarray,
                candidate_distributions: List,
                max_iter: int = 100,
                plot: bool = False,
                figure_size: tuple = (10, 5),
                ) -> pd.DataFrame:
    """
    Function that selects the most suitable distribution among the candidate_distributions for the target variable,
    based on the NegLogLikelihood (lower is better).

    Parameters
    ----------
    target: np.ndarray
        Response variable.
    candidate_distributions: List
        List of candidate distributions.
    max_iter: int
        Maximum number of iterations for the optimization.
    plot: bool
        If True, a density plot of the actual and fitted distribution is created.
    figure_size: tuple
        Figure size of the density plot.

    Returns
    -------
    fit_df: pd.DataFrame
        Dataframe with the loss values of the fitted candidate distributions.
    """
    dist_list = []
    total_iterations = len(candidate_distributions)
    with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
        for i in range(len(candidate_distributions)):
            dist_name = candidate_distributions[i].__name__.split(".")[2]
            pbar.set_description(f"Fitting {dist_name} distribution")
            dist_sel = getattr(candidate_distributions[i], dist_name)()
            try:
                loss, params = dist_sel.calculate_start_values(target=target.reshape(-1, 1), max_iter=max_iter)
                fit_df = pd.DataFrame.from_dict(
                    {self.loss_fn: loss.reshape(-1,),
                     "distribution": str(dist_name),
                     "params": [params]
                     }
                )
            except Exception as e:
                warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                fit_df = pd.DataFrame(
                    {self.loss_fn: np.nan,
                     "distribution": str(dist_name),
                     "params": [np.nan] * self.n_dist_param
                     }
                )
            dist_list.append(fit_df)
            pbar.update(1)
        pbar.set_description(f"Fitting of candidate distributions completed")
        fit_df = pd.concat(dist_list).sort_values(by=self.loss_fn, ascending=True)
        fit_df["rank"] = fit_df[self.loss_fn].rank().astype(int)
        fit_df.set_index(fit_df["rank"], inplace=True)
    if plot:
        # Select best distribution
        best_dist = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
        for dist in candidate_distributions:
            if dist.__name__.split(".")[2] == best_dist["distribution"].values[0]:
                best_dist_sel = dist
                break
        best_dist_sel = getattr(best_dist_sel, best_dist["distribution"].values[0])()
        params = torch.tensor(best_dist["params"][0]).reshape(-1, best_dist_sel.n_dist_param)

        # Transform parameters to the response scale and draw samples
        fitted_params = np.concatenate(
            [
                response_fun(params[:, i].reshape(-1, 1)).numpy()
                for i, (dist_param, response_fun) in enumerate(best_dist_sel.param_dict.items())
            ],
            axis=1,
        )
        fitted_params = pd.DataFrame(fitted_params, columns=best_dist_sel.param_dict.keys())
        n_samples = np.max([10000, target.shape[0]])
        n_samples = np.where(n_samples > 500000, 100000, n_samples)
        dist_samples = best_dist_sel.draw_samples(fitted_params,
                                                  n_samples=n_samples,
                                                  seed=123).values

        # Plot actual and fitted distribution
        plt.figure(figsize=figure_size)
        sns.kdeplot(target.reshape(-1, ), label="Actual")
        sns.kdeplot(dist_samples.reshape(-1, ), label=f"Best-Fit: {best_dist['distribution'].values[0]}")
        plt.legend()
        plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
        plt.show()

    fit_df.drop(columns=["rank", "params"], inplace=True)

    return fit_df
draw_samples(predt_params, n_samples=1000, seed=123)

Function that draws n_samples from a predicted distribution.

Arguments

predt_params: pd.DataFrame pd.DataFrame with predicted distributional parameters. n_samples: int Number of sample to draw from predicted response distribution. seed: int Manual seed.

Returns

pred_dist: pd.DataFrame DataFrame with n_samples drawn from predicted response distribution.

Source code in xgboostlss/distributions/distribution_utils.py
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
def draw_samples(self,
                 predt_params: pd.DataFrame,
                 n_samples: int = 1000,
                 seed: int = 123
                 ) -> pd.DataFrame:
    """
    Function that draws n_samples from a predicted distribution.

    Arguments
    ---------
    predt_params: pd.DataFrame
        pd.DataFrame with predicted distributional parameters.
    n_samples: int
        Number of sample to draw from predicted response distribution.
    seed: int
        Manual seed.

    Returns
    -------
    pred_dist: pd.DataFrame
        DataFrame with n_samples drawn from predicted response distribution.

    """
    torch.manual_seed(seed)

    if self.tau is None:
        pred_params = torch.tensor(predt_params.values)
        dist_kwargs = {arg_name: param for arg_name, param in zip(self.distribution_arg_names, pred_params.T)}
        dist_pred = self.distribution(**dist_kwargs)
        dist_samples = dist_pred.sample((n_samples,)).squeeze().detach().numpy().T
        dist_samples = pd.DataFrame(dist_samples)
        dist_samples.columns = [str("y_sample") + str(i) for i in range(dist_samples.shape[1])]
    else:
        dist_samples = None

    if self.discrete:
        dist_samples = dist_samples.astype(int)

    return dist_samples
get_params_loss(predt, target, start_values, requires_grad=False)

Function that returns the predicted parameters and the loss.

Arguments

predt: np.ndarray Predicted values. target: torch.Tensor Target values. start_values: List Starting values for each distributional parameter. requires_grad: bool Whether to add to the computational graph or not.

Returns

predt: List of torch.Tensors Predicted parameters. loss: torch.Tensor Loss value.

Source code in xgboostlss/distributions/distribution_utils.py
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
def get_params_loss(self,
                    predt: np.ndarray,
                    target: torch.Tensor,
                    start_values: List[float],
                    requires_grad: bool = False,
                    ) -> Tuple[List[torch.Tensor], np.ndarray]:
    """
    Function that returns the predicted parameters and the loss.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    target: torch.Tensor
        Target values.
    start_values: List
        Starting values for each distributional parameter.
    requires_grad: bool
        Whether to add to the computational graph or not.

    Returns
    -------
    predt: List of torch.Tensors
        Predicted parameters.
    loss: torch.Tensor
        Loss value.
    """
    # Predicted Parameters
    predt = predt.reshape(-1, self.n_dist_param)

    # Replace NaNs and infinity values with unconditional start values
    nan_inf_mask = np.isnan(predt) | np.isinf(predt)
    predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

    # Convert to torch.tensor
    predt = [
        torch.tensor(predt[:, i].reshape(-1, 1), requires_grad=requires_grad) for i in range(self.n_dist_param)
    ]

    # Predicted Parameters transformed to response scale
    predt_transformed = [
        response_fn(predt[i].reshape(-1, 1)) for i, response_fn in enumerate(self.param_dict.values())
    ]

    # Specify Distribution and Loss
    if self.tau is None:
        dist_kwargs = dict(zip(self.distribution_arg_names, predt_transformed))
        dist_fit = self.distribution(**dist_kwargs)
        if self.loss_fn == "nll":
            loss = -torch.nansum(dist_fit.log_prob(target))
        elif self.loss_fn == "crps":
            torch.manual_seed(123)
            dist_samples = dist_fit.rsample((30,)).squeeze(-1)
            loss = torch.nansum(self.crps_score(target, dist_samples))
        else:
            raise ValueError("Invalid loss function. Please select 'nll' or 'crps'.")
    else:
        dist_fit = self.distribution(predt_transformed, self.penalize_crossing)
        loss = -torch.nansum(dist_fit.log_prob(target, self.tau))

    return predt, loss
loss_fn_start_values(params, target)

Function that calculates the loss for a given set of distributional parameters. Only used for calculating the loss for the start values.

Parameter

params: torch.Tensor Distributional parameters. target: torch.Tensor Target values.

Returns

loss: torch.Tensor Loss value.

Source code in xgboostlss/distributions/distribution_utils.py
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
def loss_fn_start_values(self,
                         params: torch.Tensor,
                         target: torch.Tensor) -> torch.Tensor:
    """
    Function that calculates the loss for a given set of distributional parameters. Only used for calculating
    the loss for the start values.

    Parameter
    ---------
    params: torch.Tensor
        Distributional parameters.
    target: torch.Tensor
        Target values.

    Returns
    -------
    loss: torch.Tensor
        Loss value.
    """
    # Replace NaNs and infinity values with 0.5
    nan_inf_idx = torch.isnan(torch.stack(params)) | torch.isinf(torch.stack(params))
    params = torch.where(nan_inf_idx, torch.tensor(0.5), torch.stack(params))

    # Transform parameters to response scale
    params = [
        response_fn(params[i].reshape(-1, 1)) for i, response_fn in enumerate(self.param_dict.values())
    ]

    # Specify Distribution and Loss
    if self.tau is None:
        dist = self.distribution(*params)
        loss = -torch.nansum(dist.log_prob(target))
    else:
        dist = self.distribution(params, self.penalize_crossing)
        loss = -torch.nansum(dist.log_prob(target, self.tau))

    return loss
metric_fn(predt, data)

Function that evaluates the predictions using the specified loss function.

Arguments

predt: np.ndarray Predicted values. data: xgb.DMatrix Data used for training.

Returns

name: str Name of the evaluation metric. loss: float Loss value.

Source code in xgboostlss/distributions/distribution_utils.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def metric_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[str, np.ndarray]:
    """
    Function that evaluates the predictions using the specified loss function.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: xgb.DMatrix
        Data used for training.

    Returns
    -------
    name: str
        Name of the evaluation metric.
    loss: float
        Loss value.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1))

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate loss
    _, loss = self.get_params_loss(predt, target, start_values, requires_grad=False)

    return self.loss_fn, loss
objective_fn(predt, data)

Function to estimate gradients and hessians of distributional parameters.

Arguments

predt: np.ndarray Predicted values. data: xgb.DMatrix Data used for training.

Returns

grad: np.ndarray Gradient. hess: np.ndarray Hessian.

Source code in xgboostlss/distributions/distribution_utils.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def objective_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:

    """
    Function to estimate gradients and hessians of distributional parameters.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: xgb.DMatrix
        Data used for training.

    Returns
    -------
    grad: np.ndarray
        Gradient.
    hess: np.ndarray
        Hessian.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1))

    # Weights
    if data.get_weight().size == 0:
        # Use 1 as weight if no weights are specified
        weights = torch.ones_like(target, dtype=target.dtype).numpy()
    else:
        weights = data.get_weight().reshape(-1, 1)

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate gradients and hessians
    predt, loss = self.get_params_loss(predt, target, start_values, requires_grad=True)
    grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

    return grad, hess
predict_dist(booster, start_values, data, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

booster : xgb.Booster Trained model. start_values : np.ndarray Starting values for each distributional parameter. data : xgb.DMatrix Data to predict from. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. - "expectiles" returns the predicted expectiles. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

pred : pd.DataFrame Predictions.

Source code in xgboostlss/distributions/distribution_utils.py
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
def predict_dist(self,
                 booster: xgb.Booster,
                 start_values: np.ndarray,
                 data: xgb.DMatrix,
                 pred_type: str = "parameters",
                 n_samples: int = 1000,
                 quantiles: list = [0.1, 0.5, 0.9],
                 seed: str = 123
                 ) -> pd.DataFrame:
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    booster : xgb.Booster
        Trained model.
    start_values : np.ndarray
        Starting values for each distributional parameter.
    data : xgb.DMatrix
        Data to predict from.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
        - "expectiles" returns the predicted expectiles.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    pred : pd.DataFrame
        Predictions.
    """
    # Set base_margin as starting point for each distributional parameter. Requires base_score=0 in parameters.
    base_margin_test = (np.ones(shape=(data.num_row(), 1))) * start_values
    data.set_base_margin(base_margin_test.flatten())

    predt = np.array(booster.predict(data, output_margin=True)).reshape(-1, self.n_dist_param)
    predt = torch.tensor(predt, dtype=torch.float32)

    # Transform predicted parameters to response scale
    dist_params_predt = np.concatenate(
        [
            response_fun(
                predt[:, i].reshape(-1, 1)).numpy() for i, (dist_param, response_fun) in
            enumerate(self.param_dict.items())
        ],
        axis=1,
    )
    dist_params_predt = pd.DataFrame(dist_params_predt)
    dist_params_predt.columns = self.param_dict.keys()

    # Draw samples from predicted response distribution
    pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                        n_samples=n_samples,
                                        seed=seed)

    if pred_type == "parameters":
        return dist_params_predt

    elif pred_type == "expectiles":
        return dist_params_predt

    elif pred_type == "samples":
        return pred_samples_df

    elif pred_type == "quantiles":
        # Calculate quantiles from predicted response distribution
        pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
        pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
        if self.discrete:
            pred_quant_df = pred_quant_df.astype(int)
        return pred_quant_df
stabilize_derivative(input_der, type='MAD')

Function that stabilizes Gradients and Hessians.

As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges, the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve convergence might be to standardize the response variable. This is especially useful if the range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the standardization of the response are not always advised but need to be carefully considered. Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

Parameters

input_der : torch.Tensor Input derivative, either Gradient or Hessian. type: str Stabilization method. Can be either "None", "MAD" or "L2".

Returns

stab_der : torch.Tensor Stabilized Gradient or Hessian.

Source code in xgboostlss/distributions/distribution_utils.py
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
    """
    Function that stabilizes Gradients and Hessians.

    As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
    that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
    the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
    Another way to improve convergence might be to standardize the response variable. This is especially useful if the
    range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
    the standardization of the response are not always advised but need to be carefully considered.
    Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

    Parameters
    ----------
    input_der : torch.Tensor
        Input derivative, either Gradient or Hessian.
    type: str
        Stabilization method. Can be either "None", "MAD" or "L2".

    Returns
    -------
    stab_der : torch.Tensor
        Stabilized Gradient or Hessian.
    """

    if type == "MAD":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        stab_der = input_der / div

    if type == "L2":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.sqrt(torch.nanmean(input_der.pow(2)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
        stab_der = input_der / div

    if type == "None":
        stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

    return stab_der

flow_utils

NormalizingFlowClass

Generic class that contains general functions for normalizing flows.

Arguments

base_dist: torch.distributions.Distribution PyTorch Distribution class. Currently only Normal is supported. flow_transform: Transform Specify the normalizing flow transform. count_bins: Optional[int] The number of segments comprising the spline. Only used if flow_transform is Spline. bound: Optional[float] The quantity "K" determining the bounding box, [-K,K] x [-K,K] of the spline. By adjusting the "K" value, you can control the size of the bounding box and consequently control the range of inputs that the spline transform operates on. Larger values of "K" will result in a wider valid range for the spline transformation, while smaller values will restrict the valid range to a smaller region. Should be chosen based on the range of the data. Only used if flow_transform is Spline. order: Optional[str] The order of the spline. Options are "linear" or "quadratic". Only used if flow_transform is Spline. n_dist_param: int Number of parameters. param_dict: Dict[str, Any] Dictionary that maps parameters to their response scale. distribution_arg_names: List List of distributional parameter names. target_transform: Transform Specify the target transform. discrete: bool Whether the target is discrete or not. univariate: bool Whether the distribution is univariate or multivariate. stabilization: str Stabilization method. Options are "None", "MAD" or "L2". loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in xgboostlss/distributions/flow_utils.py
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
class NormalizingFlowClass:
    """
    Generic class that contains general functions for normalizing flows.

    Arguments
    ---------
    base_dist: torch.distributions.Distribution
        PyTorch Distribution class. Currently only Normal is supported.
    flow_transform: Transform
        Specify the normalizing flow transform.
    count_bins: Optional[int]
        The number of segments comprising the spline. Only used if flow_transform is Spline.
    bound: Optional[float]
        The quantity "K" determining the bounding box, [-K,K] x [-K,K] of the spline. By adjusting the
        "K" value, you can control the size of the bounding box and consequently control the range of inputs that
        the spline transform operates on. Larger values of "K" will result in a wider valid range for the spline
        transformation, while smaller values will restrict the valid range to a smaller region. Should be chosen
        based on the range of the data. Only used if flow_transform is Spline.
    order: Optional[str]
        The order of the spline. Options are "linear" or "quadratic". Only used if flow_transform is Spline.
    n_dist_param: int
        Number of parameters.
    param_dict: Dict[str, Any]
        Dictionary that maps parameters to their response scale.
    distribution_arg_names: List
        List of distributional parameter names.
    target_transform: Transform
        Specify the target transform.
    discrete: bool
        Whether the target is discrete or not.
    univariate: bool
        Whether the distribution is univariate or multivariate.
    stabilization: str
        Stabilization method. Options are "None", "MAD" or "L2".
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 base_dist: torch.distributions.Distribution = None,
                 flow_transform: Transform = None,
                 count_bins: Optional[int] = 8,
                 bound: Optional[float] = 3.0,
                 order: Optional[str] = "quadratic",
                 n_dist_param: int = None,
                 param_dict: Dict[str, Any] = None,
                 distribution_arg_names: List = None,
                 target_transform: Transform = None,
                 discrete: bool = False,
                 univariate: bool = True,
                 stabilization: str = "None",
                 loss_fn: str = "nll",
                 ):

        self.base_dist = base_dist
        self.flow_transform = flow_transform
        self.count_bins = count_bins
        self.bound = bound
        self.order = order
        self.n_dist_param = n_dist_param
        self.param_dict = param_dict
        self.distribution_arg_names = distribution_arg_names
        self.target_transform = target_transform
        self.discrete = discrete
        self.univariate = univariate
        self.stabilization = stabilization
        self.loss_fn = loss_fn

    def objective_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:

        """
        Function to estimate gradients and hessians of normalizing flow parameters.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: xgb.DMatrix
            Data used for training.

        Returns
        -------
        grad: np.ndarray
            Gradient.
        hess: np.ndarray
            Hessian.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1))

        # Weights
        if data.get_weight().size == 0:
            # Use 1 as weight if no weights are specified
            weights = torch.ones_like(target, dtype=target.dtype).numpy()
        else:
            weights = data.get_weight().reshape(-1, 1)

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate gradients and hessians
        predt, loss = self.get_params_loss(predt, target, start_values)
        grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

        return grad, hess

    def metric_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[str, np.ndarray]:
        """
        Function that evaluates the predictions using the specified loss function.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: xgb.DMatrix
            Data used for training.

        Returns
        -------
        name: str
            Name of the evaluation metric.
        loss: float
            Loss value.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1))

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate loss
        _, loss = self.get_params_loss(predt, target, start_values)

        return self.loss_fn, loss

    def calculate_start_values(self,
                               target: np.ndarray,
                               max_iter: int = 50
                               ) -> Tuple[float, np.ndarray]:
        """
        Function that calculates starting values for each parameter.

        Arguments
        ---------
        target: np.ndarray
            Data from which starting values are calculated.
        max_iter: int
            Maximum number of iterations.

        Returns
        -------
        loss: float
            Loss value.
        start_values: np.ndarray
            Starting values for each parameter.
        """
        # Convert target to torch.tensor
        target = torch.tensor(target).reshape(-1, 1)

        # Create Normalizing Flow
        flow_dist = self.create_spline_flow(input_dim=1)

        # Specify optimizer
        optimizer = LBFGS(flow_dist.transforms[0].parameters(),
                          lr=0.3,
                          max_iter=np.min([int(max_iter/4), 50]),
                          line_search_fn="strong_wolfe")

        # Define learning rate scheduler
        lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=5)

        # Define closure
        def closure():
            optimizer.zero_grad()
            loss = -torch.nansum(flow_dist.log_prob(target))
            loss.backward()
            flow_dist.clear_cache()
            return loss

        # Optimize parameters
        loss_vals = []
        tolerance = 1e-5           # Tolerance level for loss change
        patience = 5               # Patience level for loss change
        best_loss = float("inf")
        epochs_without_change = 0

        for epoch in range(max_iter):
            optimizer.zero_grad()
            loss = optimizer.step(closure)
            lr_scheduler.step(loss)
            loss_vals.append(loss.item())

            # Stopping criterion (no improvement in loss)
            if loss.item() < best_loss - tolerance:
                best_loss = loss.item()
                epochs_without_change = 0
            else:
                epochs_without_change += 1

            if epochs_without_change >= patience:
                break

        # Get final loss
        loss = np.array(loss_vals[-1])

        # Get start values
        start_values = list(flow_dist.transforms[0].parameters())
        start_values = torch.cat([param.view(-1) for param in start_values]).detach().numpy()

        # Replace any remaining NaNs or infinity values with 0.5
        start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

        return loss, start_values

    def get_params_loss(self,
                        predt: np.ndarray,
                        target: torch.Tensor,
                        start_values: List[float],
                        requires_grad: bool = False,
                        ) -> Tuple[List[torch.Tensor], np.ndarray]:
        """
        Function that returns the predicted parameters and the loss.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        target: torch.Tensor
            Target values.
        start_values: List
            Starting values for each parameter.

        Returns
        -------
        predt: torch.Tensor
            Predicted parameters.
        loss: torch.Tensor
            Loss value.
        """
        # Reshape Target
        target = target.view(-1)

        # Predicted Parameters
        predt = predt.reshape(-1, self.n_dist_param)

        # Replace NaNs and infinity values with unconditional start values
        nan_inf_mask = np.isnan(predt) | np.isinf(predt)
        predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

        # Convert to torch.tensor
        predt = torch.tensor(predt, dtype=torch.float32)

        # Specify Normalizing Flow
        flow_dist = self.create_spline_flow(target.shape[0])

        # Replace parameters with estimated ones
        params, flow_dist = self.replace_parameters(predt, flow_dist)

        # Calculate loss
        if self.loss_fn == "nll":
            loss = -torch.nansum(flow_dist.log_prob(target))
        elif self.loss_fn == "crps":
            torch.manual_seed(123)
            dist_samples = flow_dist.rsample((30,)).squeeze(-1)
            loss = torch.nansum(self.crps_score(target, dist_samples))
        else:
            raise ValueError("Invalid loss function. Please select 'nll' or 'crps'.")

        return params, loss

    def create_spline_flow(self,
                           input_dim: int = None,
                           ) -> Transform:

        """
        Function that constructs a Normalizing Flow.

        Arguments
        ---------
        input_dim: int
            Input dimension.

        Returns
        -------
        spline_flow: Transform
            Normalizing Flow.
        """

        # Create flow distribution (currently only Normal)
        loc, scale = torch.zeros(input_dim), torch.ones(input_dim)
        flow_dist = self.base_dist(loc, scale)

        # Create Spline Transform
        torch.manual_seed(123)
        spline_transform = self.flow_transform(input_dim,
                                               count_bins=self.count_bins,
                                               bound=self.bound,
                                               order=self.order)

        # Create Normalizing Flow
        spline_flow = TransformedDistribution(flow_dist, [spline_transform, self.target_transform])

        return spline_flow

    def replace_parameters(self,
                           params: torch.Tensor,
                           flow_dist: Transform,
                           ) -> Tuple[List, Transform]:
        """
        Replace parameters with estimated ones.

        Arguments
        ---------
        params: torch.Tensor
            Estimated parameters.
        flow_dist: Transform
            Normalizing Flow.

        Returns
        -------
        params_list: List
            List of estimated parameters.
        flow_dist: Transform
            Normalizing Flow with estimated parameters.
        """

        # Split parameters into list
        if self.order == "quadratic":
            params_list = torch.split(
                params, [self.count_bins, self.count_bins, self.count_bins - 1],
                dim=1)
        elif self.order == "linear":
            params_list = torch.split(
                params, [self.count_bins, self.count_bins, self.count_bins - 1, self.count_bins],
                dim=1)

        # Replace parameters
        for param, new_value in zip(flow_dist.transforms[0].parameters(), params_list):
            param.data = new_value

        # Get parameters (including require_grad=True)
        params_list = list(flow_dist.transforms[0].parameters())

        return params_list, flow_dist

    def draw_samples(self,
                     predt_params: pd.DataFrame,
                     n_samples: int = 1000,
                     seed: int = 123
                     ) -> pd.DataFrame:
        """
        Function that draws n_samples from a predicted distribution.

        Arguments
        ---------
        predt_params: pd.DataFrame
            pd.DataFrame with predicted distributional parameters.
        n_samples: int
            Number of sample to draw from predicted response distribution.
        seed: int
            Manual seed.

        Returns
        -------
        pred_dist: pd.DataFrame
            DataFrame with n_samples drawn from predicted response distribution.

        """

        torch.manual_seed(seed)

        # Specify Normalizing Flow
        pred_params = torch.tensor(predt_params.values)
        flow_dist_pred = self.create_spline_flow(pred_params.shape[0])

        # Replace parameters with estimated ones
        _, flow_dist_pred = self.replace_parameters(pred_params, flow_dist_pred)

        # Draw samples
        flow_samples = pd.DataFrame(flow_dist_pred.sample((n_samples,)).squeeze().detach().numpy().T)
        flow_samples.columns = [str("y_sample") + str(i) for i in range(flow_samples.shape[1])]

        if self.discrete:
            flow_samples = flow_samples.astype(int)

        return flow_samples

    def predict_dist(self,
                     booster: xgb.Booster,
                     start_values: np.ndarray,
                     data: xgb.DMatrix,
                     pred_type: str = "parameters",
                     n_samples: int = 1000,
                     quantiles: list = [0.1, 0.5, 0.9],
                     seed: int = 123
                     ) -> pd.DataFrame:
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        booster : xgb.Booster
            Trained model.
        start_values : np.ndarray
            Starting values for each distributional parameter.
        data : xgb.DMatrix
            Data to predict from.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        pred : pd.DataFrame
            Predictions.
        """
        # Set base_margin as starting point for each distributional parameter. Requires base_score=0 in parameters.
        base_margin_predt = (np.ones(shape=(data.num_row(), 1))) * start_values
        data.set_base_margin(base_margin_predt.flatten())

        # Predict distributional parameters
        dist_params_predt = pd.DataFrame(
            np.array(booster.predict(data, output_margin=True)).reshape(-1, self.n_dist_param)
        )
        dist_params_predt.columns = self.param_dict.keys()

        # Draw samples from predicted response distribution
        pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                            n_samples=n_samples,
                                            seed=seed)

        if pred_type == "parameters":
            return dist_params_predt

        elif pred_type == "samples":
            return pred_samples_df

        elif pred_type == "quantiles":
            # Calculate quantiles from predicted response distribution
            pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
            pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
            if self.discrete:
                pred_quant_df = pred_quant_df.astype(int)
            return pred_quant_df

    def compute_gradients_and_hessians(self,
                                       loss: torch.tensor,
                                       predt: torch.tensor,
                                       weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

        """
        Calculates gradients and hessians.

        Output gradients and hessians have shape (n_samples*n_outputs, 1).

        Arguments:
        ---------
        loss: torch.Tensor
            Loss.
        predt: torch.Tensor
            List of predicted parameters.
        weights: np.ndarray
            Weights.

        Returns:
        -------
        grad: torch.Tensor
            Gradients.
        hess: torch.Tensor
            Hessians.
        """
        if self.loss_fn == "nll":
            # Gradient and Hessian
            grad = autograd(loss, inputs=predt, create_graph=True)
            hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]
        elif self.loss_fn == "crps":
            # Gradient and Hessian
            grad = autograd(loss, inputs=predt, create_graph=True)
            hess = [torch.ones_like(grad[i]) for i in range(len(grad))]

        # Stabilization of Derivatives
        if self.stabilization != "None":
            grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
            hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

        # Reshape
        grad = torch.cat(grad, axis=1).detach().numpy()
        hess = torch.cat(hess, axis=1).detach().numpy()

        # Weighting
        grad *= weights
        hess *= weights

        # Flatten
        grad = grad.flatten()
        hess = hess.flatten()

        return grad, hess

    def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
        """
        Function that stabilizes Gradients and Hessians.

        Since parameters are estimated by optimizing Gradients and Hessians, it is important that these are comparable
        in magnitude for all parameters. Due to imbalances regarding the ranges, the estimation might become unstable
        so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve
        convergence might be to standardize the response variable. This is especially useful if the range of the
        response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the
        standardization of the response are not always advised but need to be carefully considered.

        Source
        ---------
        https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

        Arguments
        ---------
        input_der : torch.Tensor
            Input derivative, either Gradient or Hessian.
        type: str
            Stabilization method. Can be either "None", "MAD" or "L2".

        Returns
        ---------
        stab_der : torch.Tensor
            Stabilized Gradient or Hessian.
        """

        if type == "MAD":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            stab_der = input_der / div

        if type == "L2":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.sqrt(torch.nanmean(input_der.pow(2)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
            stab_der = input_der / div

        if type == "None":
            stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

        return stab_der

    def crps_score(self, y: torch.tensor, yhat_dist: torch.tensor) -> torch.tensor:
        """
        Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

        Arguments
        ---------
        y: torch.Tensor
            Response variable of shape (n_observations,1).
        yhat_dist: torch.Tensor
            Predicted samples of shape (n_samples, n_observations).

        Returns
        ---------
        crps: torch.Tensor
            CRPS score.

        References
        ---------
        Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.
        Journal of the American Statistical Association. 102. 359-378.

        Source
        ---------
        https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549
        """
        # Get the number of observations
        n_samples = yhat_dist.shape[0]

        # Sort the forecasts in ascending order
        yhat_dist_sorted, _ = torch.sort(yhat_dist, 0)

        # Create temporary tensors
        y_cdf = torch.zeros_like(y)
        yhat_cdf = torch.zeros_like(y)
        yhat_prev = torch.zeros_like(y)
        crps = torch.zeros_like(y)

        # Loop over the predicted samples generated per observation
        for yhat in yhat_dist_sorted:
            yhat = yhat.reshape(-1, 1)
            flag = (y_cdf == 0) * (y < yhat)
            crps += flag * ((y - yhat_prev) * yhat_cdf ** 2)
            crps += flag * ((yhat - y) * (yhat_cdf - 1) ** 2)
            crps += (~flag) * ((yhat - yhat_prev) * (yhat_cdf - y_cdf) ** 2)
            y_cdf += flag
            yhat_cdf += 1 / n_samples
            yhat_prev = yhat

        # In case y_cdf == 0 after the loop
        flag = (y_cdf == 0)
        crps += flag * (y - yhat)

        return crps

    def flow_select(self,
                    target: np.ndarray,
                    candidate_flows: List,
                    max_iter: int = 100,
                    plot: bool = False,
                    figure_size: tuple = (10, 5),
                    ) -> pd.DataFrame:
        """
        Function that selects the most suitable normalizing flow specification among the candidate_flow for the
        target variable, based on the NegLogLikelihood (lower is better).

        Parameters
        ----------
        target: np.ndarray
            Response variable.
        candidate_flows: List
            List of candidate normalizing flow specifications.
        max_iter: int
            Maximum number of iterations for the optimization.
        plot: bool
            If True, a density plot of the actual and fitted distribution is created.
        figure_size: tuple
            Figure size of the density plot.

        Returns
        -------
        fit_df: pd.DataFrame
            Dataframe with the loss values of the fitted normalizing flow.
        """
        flow_list = []
        total_iterations = len(candidate_flows)

        with tqdm(total=total_iterations, desc="Fitting candidate normalizing flows") as pbar:
            for flow in candidate_flows:
                flow_name = str(flow.__class__).split(".")[-1].split("'>")[0]
                flow_spec = f"(count_bins: {flow.count_bins}, order: {flow.order})"
                flow_name = flow_name + flow_spec
                pbar.set_description(f"Fitting {flow_name}")
                flow_sel = flow
                try:
                    loss, params = flow_sel.calculate_start_values(target=target, max_iter=max_iter)
                    fit_df = pd.DataFrame.from_dict(
                        {flow_sel.loss_fn: loss.reshape(-1, ),
                         "NormFlow": str(flow_name),
                         "params": [params]
                         }
                    )
                except Exception as e:
                    warnings.warn(f"Error fitting {flow_sel} NormFlow: {str(e)}")
                    fit_df = pd.DataFrame(
                        {flow_sel.loss_fn: np.nan,
                         "NormFlow": str(flow_sel),
                         "params": [np.nan] * flow_sel.n_dist_param
                         }
                    )
                flow_list.append(fit_df)
                pbar.update(1)
            pbar.set_description(f"Fitting of candidate normalizing flows completed")
            fit_df = pd.concat(flow_list).sort_values(by=flow_sel.loss_fn, ascending=True)
            fit_df["rank"] = fit_df[flow_sel.loss_fn].rank().astype(int)
            fit_df.set_index(fit_df["rank"], inplace=True)
        if plot:
            # Select normalizing flow with the lowest loss
            best_flow = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
            for flow in candidate_flows:
                flow_name = str(flow.__class__).split(".")[-1].split("'>")[0]
                flow_spec = f"(count_bins: {flow.count_bins}, order: {flow.order})"
                flow_name = flow_name + flow_spec
                if flow_name == best_flow["NormFlow"].values[0]:
                    best_flow_sel = flow
                    break

            # Draw samples from distribution
            flow_params = torch.tensor(best_flow["params"][0]).reshape(1, -1)
            flow_dist_sel = best_flow_sel.create_spline_flow(input_dim=1)
            _, flow_dist_sel = best_flow_sel.replace_parameters(flow_params, flow_dist_sel)
            n_samples = np.max([10000, target.shape[0]])
            n_samples = np.where(n_samples > 500000, 100000, n_samples)
            flow_samples = pd.DataFrame(flow_dist_sel.sample((n_samples,)).squeeze().detach().numpy().T).values

            # Plot actual and fitted distribution
            plt.figure(figsize=figure_size)
            sns.kdeplot(target.reshape(-1, ), label="Actual")
            sns.kdeplot(flow_samples.reshape(-1, ), label=f"Best-Fit: {best_flow['NormFlow'].values[0]}")
            plt.legend()
            plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
            plt.show()

        fit_df.drop(columns=["rank", "params"], inplace=True)

        return fit_df
calculate_start_values(target, max_iter=50)

Function that calculates starting values for each parameter.

Arguments

target: np.ndarray Data from which starting values are calculated. max_iter: int Maximum number of iterations.

Returns

loss: float Loss value. start_values: np.ndarray Starting values for each parameter.

Source code in xgboostlss/distributions/flow_utils.py
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
def calculate_start_values(self,
                           target: np.ndarray,
                           max_iter: int = 50
                           ) -> Tuple[float, np.ndarray]:
    """
    Function that calculates starting values for each parameter.

    Arguments
    ---------
    target: np.ndarray
        Data from which starting values are calculated.
    max_iter: int
        Maximum number of iterations.

    Returns
    -------
    loss: float
        Loss value.
    start_values: np.ndarray
        Starting values for each parameter.
    """
    # Convert target to torch.tensor
    target = torch.tensor(target).reshape(-1, 1)

    # Create Normalizing Flow
    flow_dist = self.create_spline_flow(input_dim=1)

    # Specify optimizer
    optimizer = LBFGS(flow_dist.transforms[0].parameters(),
                      lr=0.3,
                      max_iter=np.min([int(max_iter/4), 50]),
                      line_search_fn="strong_wolfe")

    # Define learning rate scheduler
    lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=5)

    # Define closure
    def closure():
        optimizer.zero_grad()
        loss = -torch.nansum(flow_dist.log_prob(target))
        loss.backward()
        flow_dist.clear_cache()
        return loss

    # Optimize parameters
    loss_vals = []
    tolerance = 1e-5           # Tolerance level for loss change
    patience = 5               # Patience level for loss change
    best_loss = float("inf")
    epochs_without_change = 0

    for epoch in range(max_iter):
        optimizer.zero_grad()
        loss = optimizer.step(closure)
        lr_scheduler.step(loss)
        loss_vals.append(loss.item())

        # Stopping criterion (no improvement in loss)
        if loss.item() < best_loss - tolerance:
            best_loss = loss.item()
            epochs_without_change = 0
        else:
            epochs_without_change += 1

        if epochs_without_change >= patience:
            break

    # Get final loss
    loss = np.array(loss_vals[-1])

    # Get start values
    start_values = list(flow_dist.transforms[0].parameters())
    start_values = torch.cat([param.view(-1) for param in start_values]).detach().numpy()

    # Replace any remaining NaNs or infinity values with 0.5
    start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

    return loss, start_values
compute_gradients_and_hessians(loss, predt, weights)

Calculates gradients and hessians.

Output gradients and hessians have shape (n_samples*n_outputs, 1).

Arguments:

loss: torch.Tensor Loss. predt: torch.Tensor List of predicted parameters. weights: np.ndarray Weights.

Returns:

grad: torch.Tensor Gradients. hess: torch.Tensor Hessians.

Source code in xgboostlss/distributions/flow_utils.py
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
def compute_gradients_and_hessians(self,
                                   loss: torch.tensor,
                                   predt: torch.tensor,
                                   weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

    """
    Calculates gradients and hessians.

    Output gradients and hessians have shape (n_samples*n_outputs, 1).

    Arguments:
    ---------
    loss: torch.Tensor
        Loss.
    predt: torch.Tensor
        List of predicted parameters.
    weights: np.ndarray
        Weights.

    Returns:
    -------
    grad: torch.Tensor
        Gradients.
    hess: torch.Tensor
        Hessians.
    """
    if self.loss_fn == "nll":
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]
    elif self.loss_fn == "crps":
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [torch.ones_like(grad[i]) for i in range(len(grad))]

    # Stabilization of Derivatives
    if self.stabilization != "None":
        grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
        hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

    # Reshape
    grad = torch.cat(grad, axis=1).detach().numpy()
    hess = torch.cat(hess, axis=1).detach().numpy()

    # Weighting
    grad *= weights
    hess *= weights

    # Flatten
    grad = grad.flatten()
    hess = hess.flatten()

    return grad, hess
create_spline_flow(input_dim=None)

Function that constructs a Normalizing Flow.

Arguments

input_dim: int Input dimension.

Returns

spline_flow: Transform Normalizing Flow.

Source code in xgboostlss/distributions/flow_utils.py
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
def create_spline_flow(self,
                       input_dim: int = None,
                       ) -> Transform:

    """
    Function that constructs a Normalizing Flow.

    Arguments
    ---------
    input_dim: int
        Input dimension.

    Returns
    -------
    spline_flow: Transform
        Normalizing Flow.
    """

    # Create flow distribution (currently only Normal)
    loc, scale = torch.zeros(input_dim), torch.ones(input_dim)
    flow_dist = self.base_dist(loc, scale)

    # Create Spline Transform
    torch.manual_seed(123)
    spline_transform = self.flow_transform(input_dim,
                                           count_bins=self.count_bins,
                                           bound=self.bound,
                                           order=self.order)

    # Create Normalizing Flow
    spline_flow = TransformedDistribution(flow_dist, [spline_transform, self.target_transform])

    return spline_flow
crps_score(y, yhat_dist)

Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

Arguments

y: torch.Tensor Response variable of shape (n_observations,1). yhat_dist: torch.Tensor Predicted samples of shape (n_samples, n_observations).

Returns

crps: torch.Tensor CRPS score.

References

Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association. 102. 359-378.

Source

https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549

Source code in xgboostlss/distributions/flow_utils.py
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
def crps_score(self, y: torch.tensor, yhat_dist: torch.tensor) -> torch.tensor:
    """
    Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

    Arguments
    ---------
    y: torch.Tensor
        Response variable of shape (n_observations,1).
    yhat_dist: torch.Tensor
        Predicted samples of shape (n_samples, n_observations).

    Returns
    ---------
    crps: torch.Tensor
        CRPS score.

    References
    ---------
    Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.
    Journal of the American Statistical Association. 102. 359-378.

    Source
    ---------
    https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549
    """
    # Get the number of observations
    n_samples = yhat_dist.shape[0]

    # Sort the forecasts in ascending order
    yhat_dist_sorted, _ = torch.sort(yhat_dist, 0)

    # Create temporary tensors
    y_cdf = torch.zeros_like(y)
    yhat_cdf = torch.zeros_like(y)
    yhat_prev = torch.zeros_like(y)
    crps = torch.zeros_like(y)

    # Loop over the predicted samples generated per observation
    for yhat in yhat_dist_sorted:
        yhat = yhat.reshape(-1, 1)
        flag = (y_cdf == 0) * (y < yhat)
        crps += flag * ((y - yhat_prev) * yhat_cdf ** 2)
        crps += flag * ((yhat - y) * (yhat_cdf - 1) ** 2)
        crps += (~flag) * ((yhat - yhat_prev) * (yhat_cdf - y_cdf) ** 2)
        y_cdf += flag
        yhat_cdf += 1 / n_samples
        yhat_prev = yhat

    # In case y_cdf == 0 after the loop
    flag = (y_cdf == 0)
    crps += flag * (y - yhat)

    return crps
draw_samples(predt_params, n_samples=1000, seed=123)

Function that draws n_samples from a predicted distribution.

Arguments

predt_params: pd.DataFrame pd.DataFrame with predicted distributional parameters. n_samples: int Number of sample to draw from predicted response distribution. seed: int Manual seed.

Returns

pred_dist: pd.DataFrame DataFrame with n_samples drawn from predicted response distribution.

Source code in xgboostlss/distributions/flow_utils.py
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
def draw_samples(self,
                 predt_params: pd.DataFrame,
                 n_samples: int = 1000,
                 seed: int = 123
                 ) -> pd.DataFrame:
    """
    Function that draws n_samples from a predicted distribution.

    Arguments
    ---------
    predt_params: pd.DataFrame
        pd.DataFrame with predicted distributional parameters.
    n_samples: int
        Number of sample to draw from predicted response distribution.
    seed: int
        Manual seed.

    Returns
    -------
    pred_dist: pd.DataFrame
        DataFrame with n_samples drawn from predicted response distribution.

    """

    torch.manual_seed(seed)

    # Specify Normalizing Flow
    pred_params = torch.tensor(predt_params.values)
    flow_dist_pred = self.create_spline_flow(pred_params.shape[0])

    # Replace parameters with estimated ones
    _, flow_dist_pred = self.replace_parameters(pred_params, flow_dist_pred)

    # Draw samples
    flow_samples = pd.DataFrame(flow_dist_pred.sample((n_samples,)).squeeze().detach().numpy().T)
    flow_samples.columns = [str("y_sample") + str(i) for i in range(flow_samples.shape[1])]

    if self.discrete:
        flow_samples = flow_samples.astype(int)

    return flow_samples
flow_select(target, candidate_flows, max_iter=100, plot=False, figure_size=(10, 5))

Function that selects the most suitable normalizing flow specification among the candidate_flow for the target variable, based on the NegLogLikelihood (lower is better).

Parameters

target: np.ndarray Response variable. candidate_flows: List List of candidate normalizing flow specifications. max_iter: int Maximum number of iterations for the optimization. plot: bool If True, a density plot of the actual and fitted distribution is created. figure_size: tuple Figure size of the density plot.

Returns

fit_df: pd.DataFrame Dataframe with the loss values of the fitted normalizing flow.

Source code in xgboostlss/distributions/flow_utils.py
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
def flow_select(self,
                target: np.ndarray,
                candidate_flows: List,
                max_iter: int = 100,
                plot: bool = False,
                figure_size: tuple = (10, 5),
                ) -> pd.DataFrame:
    """
    Function that selects the most suitable normalizing flow specification among the candidate_flow for the
    target variable, based on the NegLogLikelihood (lower is better).

    Parameters
    ----------
    target: np.ndarray
        Response variable.
    candidate_flows: List
        List of candidate normalizing flow specifications.
    max_iter: int
        Maximum number of iterations for the optimization.
    plot: bool
        If True, a density plot of the actual and fitted distribution is created.
    figure_size: tuple
        Figure size of the density plot.

    Returns
    -------
    fit_df: pd.DataFrame
        Dataframe with the loss values of the fitted normalizing flow.
    """
    flow_list = []
    total_iterations = len(candidate_flows)

    with tqdm(total=total_iterations, desc="Fitting candidate normalizing flows") as pbar:
        for flow in candidate_flows:
            flow_name = str(flow.__class__).split(".")[-1].split("'>")[0]
            flow_spec = f"(count_bins: {flow.count_bins}, order: {flow.order})"
            flow_name = flow_name + flow_spec
            pbar.set_description(f"Fitting {flow_name}")
            flow_sel = flow
            try:
                loss, params = flow_sel.calculate_start_values(target=target, max_iter=max_iter)
                fit_df = pd.DataFrame.from_dict(
                    {flow_sel.loss_fn: loss.reshape(-1, ),
                     "NormFlow": str(flow_name),
                     "params": [params]
                     }
                )
            except Exception as e:
                warnings.warn(f"Error fitting {flow_sel} NormFlow: {str(e)}")
                fit_df = pd.DataFrame(
                    {flow_sel.loss_fn: np.nan,
                     "NormFlow": str(flow_sel),
                     "params": [np.nan] * flow_sel.n_dist_param
                     }
                )
            flow_list.append(fit_df)
            pbar.update(1)
        pbar.set_description(f"Fitting of candidate normalizing flows completed")
        fit_df = pd.concat(flow_list).sort_values(by=flow_sel.loss_fn, ascending=True)
        fit_df["rank"] = fit_df[flow_sel.loss_fn].rank().astype(int)
        fit_df.set_index(fit_df["rank"], inplace=True)
    if plot:
        # Select normalizing flow with the lowest loss
        best_flow = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
        for flow in candidate_flows:
            flow_name = str(flow.__class__).split(".")[-1].split("'>")[0]
            flow_spec = f"(count_bins: {flow.count_bins}, order: {flow.order})"
            flow_name = flow_name + flow_spec
            if flow_name == best_flow["NormFlow"].values[0]:
                best_flow_sel = flow
                break

        # Draw samples from distribution
        flow_params = torch.tensor(best_flow["params"][0]).reshape(1, -1)
        flow_dist_sel = best_flow_sel.create_spline_flow(input_dim=1)
        _, flow_dist_sel = best_flow_sel.replace_parameters(flow_params, flow_dist_sel)
        n_samples = np.max([10000, target.shape[0]])
        n_samples = np.where(n_samples > 500000, 100000, n_samples)
        flow_samples = pd.DataFrame(flow_dist_sel.sample((n_samples,)).squeeze().detach().numpy().T).values

        # Plot actual and fitted distribution
        plt.figure(figsize=figure_size)
        sns.kdeplot(target.reshape(-1, ), label="Actual")
        sns.kdeplot(flow_samples.reshape(-1, ), label=f"Best-Fit: {best_flow['NormFlow'].values[0]}")
        plt.legend()
        plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
        plt.show()

    fit_df.drop(columns=["rank", "params"], inplace=True)

    return fit_df
get_params_loss(predt, target, start_values, requires_grad=False)

Function that returns the predicted parameters and the loss.

Arguments

predt: np.ndarray Predicted values. target: torch.Tensor Target values. start_values: List Starting values for each parameter.

Returns

predt: torch.Tensor Predicted parameters. loss: torch.Tensor Loss value.

Source code in xgboostlss/distributions/flow_utils.py
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
def get_params_loss(self,
                    predt: np.ndarray,
                    target: torch.Tensor,
                    start_values: List[float],
                    requires_grad: bool = False,
                    ) -> Tuple[List[torch.Tensor], np.ndarray]:
    """
    Function that returns the predicted parameters and the loss.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    target: torch.Tensor
        Target values.
    start_values: List
        Starting values for each parameter.

    Returns
    -------
    predt: torch.Tensor
        Predicted parameters.
    loss: torch.Tensor
        Loss value.
    """
    # Reshape Target
    target = target.view(-1)

    # Predicted Parameters
    predt = predt.reshape(-1, self.n_dist_param)

    # Replace NaNs and infinity values with unconditional start values
    nan_inf_mask = np.isnan(predt) | np.isinf(predt)
    predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

    # Convert to torch.tensor
    predt = torch.tensor(predt, dtype=torch.float32)

    # Specify Normalizing Flow
    flow_dist = self.create_spline_flow(target.shape[0])

    # Replace parameters with estimated ones
    params, flow_dist = self.replace_parameters(predt, flow_dist)

    # Calculate loss
    if self.loss_fn == "nll":
        loss = -torch.nansum(flow_dist.log_prob(target))
    elif self.loss_fn == "crps":
        torch.manual_seed(123)
        dist_samples = flow_dist.rsample((30,)).squeeze(-1)
        loss = torch.nansum(self.crps_score(target, dist_samples))
    else:
        raise ValueError("Invalid loss function. Please select 'nll' or 'crps'.")

    return params, loss
metric_fn(predt, data)

Function that evaluates the predictions using the specified loss function.

Arguments

predt: np.ndarray Predicted values. data: xgb.DMatrix Data used for training.

Returns

name: str Name of the evaluation metric. loss: float Loss value.

Source code in xgboostlss/distributions/flow_utils.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
def metric_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[str, np.ndarray]:
    """
    Function that evaluates the predictions using the specified loss function.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: xgb.DMatrix
        Data used for training.

    Returns
    -------
    name: str
        Name of the evaluation metric.
    loss: float
        Loss value.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1))

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate loss
    _, loss = self.get_params_loss(predt, target, start_values)

    return self.loss_fn, loss
objective_fn(predt, data)

Function to estimate gradients and hessians of normalizing flow parameters.

Arguments

predt: np.ndarray Predicted values. data: xgb.DMatrix Data used for training.

Returns

grad: np.ndarray Gradient. hess: np.ndarray Hessian.

Source code in xgboostlss/distributions/flow_utils.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def objective_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:

    """
    Function to estimate gradients and hessians of normalizing flow parameters.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: xgb.DMatrix
        Data used for training.

    Returns
    -------
    grad: np.ndarray
        Gradient.
    hess: np.ndarray
        Hessian.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1))

    # Weights
    if data.get_weight().size == 0:
        # Use 1 as weight if no weights are specified
        weights = torch.ones_like(target, dtype=target.dtype).numpy()
    else:
        weights = data.get_weight().reshape(-1, 1)

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate gradients and hessians
    predt, loss = self.get_params_loss(predt, target, start_values)
    grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

    return grad, hess
predict_dist(booster, start_values, data, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

booster : xgb.Booster Trained model. start_values : np.ndarray Starting values for each distributional parameter. data : xgb.DMatrix Data to predict from. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

pred : pd.DataFrame Predictions.

Source code in xgboostlss/distributions/flow_utils.py
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
def predict_dist(self,
                 booster: xgb.Booster,
                 start_values: np.ndarray,
                 data: xgb.DMatrix,
                 pred_type: str = "parameters",
                 n_samples: int = 1000,
                 quantiles: list = [0.1, 0.5, 0.9],
                 seed: int = 123
                 ) -> pd.DataFrame:
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    booster : xgb.Booster
        Trained model.
    start_values : np.ndarray
        Starting values for each distributional parameter.
    data : xgb.DMatrix
        Data to predict from.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    pred : pd.DataFrame
        Predictions.
    """
    # Set base_margin as starting point for each distributional parameter. Requires base_score=0 in parameters.
    base_margin_predt = (np.ones(shape=(data.num_row(), 1))) * start_values
    data.set_base_margin(base_margin_predt.flatten())

    # Predict distributional parameters
    dist_params_predt = pd.DataFrame(
        np.array(booster.predict(data, output_margin=True)).reshape(-1, self.n_dist_param)
    )
    dist_params_predt.columns = self.param_dict.keys()

    # Draw samples from predicted response distribution
    pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                        n_samples=n_samples,
                                        seed=seed)

    if pred_type == "parameters":
        return dist_params_predt

    elif pred_type == "samples":
        return pred_samples_df

    elif pred_type == "quantiles":
        # Calculate quantiles from predicted response distribution
        pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
        pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
        if self.discrete:
            pred_quant_df = pred_quant_df.astype(int)
        return pred_quant_df
replace_parameters(params, flow_dist)

Replace parameters with estimated ones.

Arguments

params: torch.Tensor Estimated parameters. flow_dist: Transform Normalizing Flow.

Returns

params_list: List List of estimated parameters. flow_dist: Transform Normalizing Flow with estimated parameters.

Source code in xgboostlss/distributions/flow_utils.py
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
def replace_parameters(self,
                       params: torch.Tensor,
                       flow_dist: Transform,
                       ) -> Tuple[List, Transform]:
    """
    Replace parameters with estimated ones.

    Arguments
    ---------
    params: torch.Tensor
        Estimated parameters.
    flow_dist: Transform
        Normalizing Flow.

    Returns
    -------
    params_list: List
        List of estimated parameters.
    flow_dist: Transform
        Normalizing Flow with estimated parameters.
    """

    # Split parameters into list
    if self.order == "quadratic":
        params_list = torch.split(
            params, [self.count_bins, self.count_bins, self.count_bins - 1],
            dim=1)
    elif self.order == "linear":
        params_list = torch.split(
            params, [self.count_bins, self.count_bins, self.count_bins - 1, self.count_bins],
            dim=1)

    # Replace parameters
    for param, new_value in zip(flow_dist.transforms[0].parameters(), params_list):
        param.data = new_value

    # Get parameters (including require_grad=True)
    params_list = list(flow_dist.transforms[0].parameters())

    return params_list, flow_dist
stabilize_derivative(input_der, type='MAD')

Function that stabilizes Gradients and Hessians.

Since parameters are estimated by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all parameters. Due to imbalances regarding the ranges, the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve convergence might be to standardize the response variable. This is especially useful if the range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the standardization of the response are not always advised but need to be carefully considered.

Source

https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

Arguments

input_der : torch.Tensor Input derivative, either Gradient or Hessian. type: str Stabilization method. Can be either "None", "MAD" or "L2".

Returns

stab_der : torch.Tensor Stabilized Gradient or Hessian.

Source code in xgboostlss/distributions/flow_utils.py
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
    """
    Function that stabilizes Gradients and Hessians.

    Since parameters are estimated by optimizing Gradients and Hessians, it is important that these are comparable
    in magnitude for all parameters. Due to imbalances regarding the ranges, the estimation might become unstable
    so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve
    convergence might be to standardize the response variable. This is especially useful if the range of the
    response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the
    standardization of the response are not always advised but need to be carefully considered.

    Source
    ---------
    https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

    Arguments
    ---------
    input_der : torch.Tensor
        Input derivative, either Gradient or Hessian.
    type: str
        Stabilization method. Can be either "None", "MAD" or "L2".

    Returns
    ---------
    stab_der : torch.Tensor
        Stabilized Gradient or Hessian.
    """

    if type == "MAD":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        stab_der = input_der / div

    if type == "L2":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.sqrt(torch.nanmean(input_der.pow(2)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
        stab_der = input_der / div

    if type == "None":
        stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

    return stab_der

mixture_distribution_utils

MixtureDistributionClass

Generic class that contains general functions for mixed-density distributions.

Arguments

distribution: torch.distributions.Distribution PyTorch Distribution class. M: int Number of components in the mixture distribution. temperature: float Temperature for the Gumbel-Softmax distribution. hessian_mode: str Mode for computing the Hessian. Must be one of the following:

    - "individual": Each parameter is treated as a separate tensor. As a result, when the Hessian is calculated
    for each gradient element, this corresponds to the second derivative with respect to that specific tensor
    element only. This means the resulting Hessians capture the curvature of the loss w.r.t. each individual
    parameter. This is usually more runtime intensive, but can also be more accurate.

    - "grouped": Each parameter is a tensor containing all values for a specific parameter type,
    e.g., loc, scale, or mixture probabilities for a Gaussian Mixture. When computing the Hessian for each
    gradient element, the Hessian matrix for all the values in the respective tensor are calculated together.
    The resulting Hessians capture the curvature of the loss w.r.t. the entire parameter type tensor. This is
    usually less runtime intensive, but can be less accurate.

univariate: bool Whether the distribution is univariate or multivariate. discrete: bool Whether the support of the distribution is discrete or continuous. n_dist_param: int Number of distributional parameters. stabilization: str Stabilization method. param_dict: Dict[str, Any] Dictionary that maps distributional parameters to their response scale. distribution_arg_names: List List of distributional parameter names. loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/mixture_distribution_utils.py
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
class MixtureDistributionClass:
    """
    Generic class that contains general functions for mixed-density distributions.

    Arguments
    ---------
    distribution: torch.distributions.Distribution
        PyTorch Distribution class.
    M: int
        Number of components in the mixture distribution.
    temperature: float
        Temperature for the Gumbel-Softmax distribution.
    hessian_mode: str
        Mode for computing the Hessian. Must be one of the following:

            - "individual": Each parameter is treated as a separate tensor. As a result, when the Hessian is calculated
            for each gradient element, this corresponds to the second derivative with respect to that specific tensor
            element only. This means the resulting Hessians capture the curvature of the loss w.r.t. each individual
            parameter. This is usually more runtime intensive, but can also be more accurate.

            - "grouped": Each parameter is a tensor containing all values for a specific parameter type,
            e.g., loc, scale, or mixture probabilities for a Gaussian Mixture. When computing the Hessian for each
            gradient element, the Hessian matrix for all the values in the respective tensor are calculated together.
            The resulting Hessians capture the curvature of the loss w.r.t. the entire parameter type tensor. This is
            usually less runtime intensive, but can be less accurate.
    univariate: bool
        Whether the distribution is univariate or multivariate.
    discrete: bool
        Whether the support of the distribution is discrete or continuous.
    n_dist_param: int
        Number of distributional parameters.
    stabilization: str
        Stabilization method.
    param_dict: Dict[str, Any]
        Dictionary that maps distributional parameters to their response scale.
    distribution_arg_names: List
        List of distributional parameter names.
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 distribution: torch.distributions.Distribution = None,
                 M: int = 2,
                 temperature: float = 1.0,
                 hessian_mode: str = "individual",
                 univariate: bool = True,
                 discrete: bool = False,
                 n_dist_param: int = None,
                 stabilization: str = "None",
                 param_dict: Dict[str, Any] = None,
                 distribution_arg_names: List = None,
                 loss_fn: str = "nll",
                 ):

        self.distribution = distribution
        self.M = M
        self.temperature = temperature
        self.hessian_mode = hessian_mode
        self.univariate = univariate
        self.discrete = discrete
        self.n_dist_param = n_dist_param
        self.stabilization = stabilization
        self.param_dict = param_dict
        self.distribution_arg_names = distribution_arg_names
        self.loss_fn = loss_fn

    def objective_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:

        """
        Function to estimate gradients and hessians of distributional parameters.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: xgb.DMatrix
            Data used for training.

        Returns
        -------
        grad: np.ndarray
            Gradient.
        hess: np.ndarray
            Hessian.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1), dtype=torch.float32)

        # Weights
        if data.get_weight().size == 0:
            # Use 1 as weight if no weights are specified
            weights = np.ones_like(target, dtype="float32")
        else:
            weights = data.get_weight().reshape(-1, 1)

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate gradients and hessians
        predt, loss = self.get_params_loss(predt, target.flatten(), start_values, requires_grad=True)
        grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

        return grad, hess

    def metric_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[str, np.ndarray]:
        """
        Function that evaluates the predictions using the specified loss function.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: xgb.DMatrix
            Data used for training.

        Returns
        -------
        name: str
            Name of the evaluation metric.
        loss: float
            Loss value.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1), dtype=torch.float32)

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate loss
        _, loss = self.get_params_loss(predt, target.flatten(), start_values, requires_grad=False)

        return self.loss_fn, loss

    def create_mixture_distribution(self,
                                    params: List[torch.Tensor],
                                    ) -> torch.distributions.Distribution:
        """
        Function that creates a mixture distribution.

        Arguments
        ---------
        params: torch.Tensor
            Distributional parameters.

        Returns
        -------
        dist: torch.distributions.Distribution
            Mixture distribution.
        """

        # Create Mixture Distribution
        mixture_cat = Categorical(probs=params[-1])
        mixture_comp = self.distribution.distribution(*params[:-1])
        mixture_dist = MixtureSameFamily(mixture_cat, mixture_comp)

        return mixture_dist

    def loss_fn_start_values(self,
                             params: torch.Tensor,
                             target: torch.Tensor) -> torch.Tensor:
        """
        Function that calculates the loss for a given set of distributional parameters. Only used for calculating
        the loss for the start values.

        Parameter
        ---------
        params: torch.Tensor
            Distributional parameters.
        target: torch.Tensor
            Target values.

        Returns
        -------
        loss: torch.Tensor
            Loss value.
        """
        # Replace NaNs and infinity values with 0.5
        nan_inf_idx = torch.isnan(torch.stack(params)) | torch.isinf(torch.stack(params))
        params = torch.where(nan_inf_idx, torch.tensor(0.5), torch.stack(params)).reshape(1, -1)
        params = torch.split(params, self.M, dim=1)

        # Transform parameters to response scale
        params = [response_fn(params[i]) for i, response_fn in enumerate(self.param_dict.values())]

        # Specify Distribution and Loss
        dist = self.create_mixture_distribution(params)
        loss = -torch.nansum(dist.log_prob(target))

        return loss

    def calculate_start_values(self,
                               target: np.ndarray,
                               max_iter: int = 50
                               ) -> Tuple[float, np.ndarray]:
        """
        Function that calculates the starting values for each distributional parameter.

        Arguments
        ---------
        target: np.ndarray
            Data from which starting values are calculated.
        max_iter: int
            Maximum number of iterations.

        Returns
        -------
        loss: float
            Loss value.
        start_values: np.ndarray
            Starting values for each distributional parameter.
        """
        # Convert target to torch.tensor
        target = torch.tensor(target, dtype=torch.float32).flatten()

        # Initialize parameters
        params = [torch.tensor(0.5, requires_grad=True) for _ in range(self.n_dist_param)]

        # Specify optimizer
        optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter/4), 20]), line_search_fn="strong_wolfe")

        # Define learning rate scheduler
        lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

        # Define closure
        def closure():
            optimizer.zero_grad()
            loss = self.loss_fn_start_values(params, target)
            loss.backward()
            return loss

        # Optimize parameters
        loss_vals = []
        tolerance = 1e-5
        patience = 5
        best_loss = float("inf")
        epochs_without_change = 0

        for epoch in range(max_iter):
            optimizer.zero_grad()
            loss = optimizer.step(closure)
            lr_scheduler.step(loss)
            loss_vals.append(loss.item())

            # Stopping criterion (no improvement in loss)
            if loss.item() < best_loss - tolerance:
                best_loss = loss.item()
                epochs_without_change = 0
            else:
                epochs_without_change += 1

            if epochs_without_change >= patience:
                break

        # Get final loss
        loss = np.array(loss_vals[-1])

        # Get start values
        start_values = np.array([params[i].detach() for i in range(self.n_dist_param)])

        # Replace any remaining NaNs or infinity values with 0.5
        start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

        return loss, start_values

    def get_params_loss(self,
                        predt: np.ndarray,
                        target: torch.Tensor,
                        start_values: List[float],
                        requires_grad: bool = False,
                        ) -> Tuple[List[torch.Tensor], np.ndarray]:
        """
        Function that returns the predicted parameters and the loss.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        target: torch.Tensor
            Target values.
        start_values: List
            Starting values for each distributional parameter.
        requires_grad: bool
            Whether to add to the computational graph or not.

        Returns
        -------
        predt: List of torch.Tensors
            Predicted parameters.
        loss: torch.Tensor
            Loss value.
        """
        # Predicted Parameters
        predt = predt.reshape(-1, self.n_dist_param)

        # Replace NaNs and infinity values with unconditional start values
        nan_inf_mask = np.isnan(predt) | np.isinf(predt)
        predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

        if self.hessian_mode == "grouped":
            # Convert to torch.Tensor: splits the parameters into tensors for each parameter-type
            predt = torch.split(torch.tensor(predt, requires_grad=requires_grad), self.M, dim=1)
            # Transform parameters to response scale
            predt_transformed = [response_fn(predt[i]) for i, response_fn in enumerate(self.param_dict.values())]

        else:
            # Convert to torch.Tensor: splits the parameters into tensors for each parameter individually
            predt = torch.split(torch.tensor(predt, requires_grad=requires_grad), 1, dim=1)
            # Transform parameters to response scale
            keys = list(self.param_dict.keys())
            max_index = len(self.param_dict) * self.M
            index_ranges = []
            for i in range(0, max_index, self.M):
                if i + self.M >= max_index:
                    index_ranges.append((i, None))
                    break
                index_ranges.append((i, i + self.M))

            predt_transformed = []
            for key, (start, end) in zip(keys, index_ranges):
                predt_transformed.append(self.param_dict[key](torch.cat(predt[start:end], dim=1)))

        # Specify Distribution and Loss
        dist_fit = self.create_mixture_distribution(predt_transformed)
        loss = -torch.nansum(dist_fit.log_prob(target))

        return predt, loss

    def draw_samples(self,
                     predt_params: pd.DataFrame,
                     n_samples: int = 1000,
                     seed: int = 123
                     ) -> pd.DataFrame:
        """
        Function that draws n_samples from a predicted distribution.

        Arguments
        ---------
        predt_params: pd.DataFrame
            pd.DataFrame with predicted distributional parameters.
        n_samples: int
            Number of sample to draw from predicted response distribution.
        seed: int
            Manual seed.

        Returns
        -------
        pred_dist: pd.DataFrame
            DataFrame with n_samples drawn from predicted response distribution.

        """
        torch.manual_seed(seed)

        pred_params = torch.tensor(predt_params.values).reshape(-1, self.n_dist_param)
        pred_params = torch.split(pred_params, self.M, dim=1)
        dist_pred = self.create_mixture_distribution(pred_params)
        dist_samples = dist_pred.sample((n_samples,)).squeeze().detach().numpy().T
        dist_samples = pd.DataFrame(dist_samples)
        dist_samples.columns = [str("y_sample") + str(i) for i in range(dist_samples.shape[1])]

        if self.discrete:
            dist_samples = dist_samples.astype(int)

        return dist_samples

    def predict_dist(self,
                     booster: xgb.Booster,
                     start_values: np.ndarray,
                     data: xgb.DMatrix,
                     pred_type: str = "parameters",
                     n_samples: int = 1000,
                     quantiles: list = [0.1, 0.5, 0.9],
                     seed: str = 123
                     ) -> pd.DataFrame:
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        booster : xgb.Booster
            Trained model.
        start_values : np.ndarray
            Starting values for each distributional parameter.
        data : xgb.DMatrix
            Data to predict from.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        pred : pd.DataFrame
            Predictions.
        """
        # Set base_margin as starting point for each distributional parameter. Requires base_score=0 in parameters.
        base_margin_predt = (np.ones(shape=(data.num_row(), 1))) * start_values
        data.set_base_margin(base_margin_predt.flatten())

        predt = np.array(booster.predict(data, output_margin=True)).reshape(-1, self.n_dist_param)
        predt = torch.split(torch.tensor(predt, dtype=torch.float32), self.M, dim=1)

        # Transform predicted parameters to response scale
        dist_params_predt = np.concatenate(
            [
                response_fun(predt[i]).numpy() for i, (dist_param, response_fun) in enumerate(self.param_dict.items())
            ],
            axis=1,
        )
        dist_params_predt = pd.DataFrame(dist_params_predt)
        dist_params_predt.columns = self.distribution_arg_names

        # Draw samples from predicted response distribution
        pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                            n_samples=n_samples,
                                            seed=seed)

        if pred_type == "parameters":
            return dist_params_predt

        elif pred_type == "samples":
            return pred_samples_df

        elif pred_type == "quantiles":
            # Calculate quantiles from predicted response distribution
            pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
            pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
            if self.discrete:
                pred_quant_df = pred_quant_df.astype(int)
            return pred_quant_df

    def compute_gradients_and_hessians(self,
                                       loss: torch.Tensor,
                                       predt: List[torch.Tensor],
                                       weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

        """
        Calculates gradients and hessians.

        Output gradients and hessians have shape (n_samples*n_outputs, 1).

        Arguments:
        ---------
        loss: torch.Tensor
            Loss.
        predt: torch.Tensor
            List of predicted parameters.
        weights: np.ndarray
            Weights.

        Returns:
        -------
        grad: torch.Tensor
            Gradients.
        hess: torch.Tensor
            Hessians.
        """
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]

        # Stabilization of Derivatives
        if self.stabilization != "None":
            grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
            hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

        # Reshape
        grad = torch.cat(grad, axis=1).detach().squeeze(-1).numpy()
        hess = torch.cat(hess, axis=1).detach().squeeze(-1).numpy()

        # Weighting
        grad *= weights
        hess *= weights

        # Flatten
        grad = grad.flatten()
        hess = hess.flatten()

        return grad, hess

    def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
        """
        Function that stabilizes Gradients and Hessians.

        As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
        that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
        the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
        Another way to improve convergence might be to standardize the response variable. This is especially useful if the
        range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
        the standardization of the response are not always advised but need to be carefully considered.
        Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

        Parameters
        ----------
        input_der : torch.Tensor
            Input derivative, either Gradient or Hessian.
        type: str
            Stabilization method. Can be either "None", "MAD" or "L2".

        Returns
        -------
        stab_der : torch.Tensor
            Stabilized Gradient or Hessian.
        """

        if type == "MAD":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            stab_der = input_der / div

        if type == "L2":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.sqrt(torch.nanmean(input_der.pow(2)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
            stab_der = input_der / div

        if type == "None":
            stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

        return stab_der

    def dist_select(self,
                    target: np.ndarray,
                    candidate_distributions: List,
                    max_iter: int = 100,
                    plot: bool = False,
                    figure_size: tuple = (8, 5),
                    ) -> pd.DataFrame:
        """
        Function that selects the most suitable distribution among the candidate_distributions for the target variable,
        based on the NegLogLikelihood (lower is better).

        Parameters
        ----------
        target: np.ndarray
            Response variable.
        candidate_distributions: List
            List of candidate distributions.
        max_iter: int
            Maximum number of iterations for the optimization.
        plot: bool
            If True, a density plot of the actual and fitted distribution is created.
        figure_size: tuple
            Figure size of the density plot.

        Returns
        -------
        fit_df: pd.DataFrame
            Dataframe with the loss values of the fitted candidate distributions.
        """
        dist_list = []
        total_iterations = len(candidate_distributions)
        with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
            for i in range(len(candidate_distributions)):
                dist_name = candidate_distributions[i].distribution.__class__.__name__
                n_mix = candidate_distributions[i].M
                tau = candidate_distributions[i].temperature
                dist_name = f"Mixture({dist_name}, tau={tau}, M={n_mix})"
                pbar.set_description(f"Fitting {dist_name} distribution")
                try:
                    loss, params = candidate_distributions[i].calculate_start_values(target=target, max_iter=max_iter)
                    fit_df = pd.DataFrame.from_dict(
                        {candidate_distributions[i].loss_fn: loss.reshape(-1, ),
                         "distribution": str(dist_name),
                         "params": [params],
                         "dist_pos": i,
                         "M": candidate_distributions[i].M
                         }
                    )
                except Exception as e:
                    warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                    fit_df = pd.DataFrame(
                        {candidate_distributions[i].loss_fn: np.nan,
                         "distribution": str(dist_name),
                         "params": [np.nan] * self.n_dist_param,
                         "dist_pos": i,
                         "M": candidate_distributions[i].M
                         }
                    )
                dist_list.append(fit_df)
                pbar.update(1)
            pbar.set_description(f"Fitting of candidate distributions completed")
            fit_df = pd.concat(dist_list).sort_values(by=candidate_distributions[i].loss_fn, ascending=True)
            fit_df["rank"] = fit_df[candidate_distributions[i].loss_fn].rank().astype(int)
            fit_df.set_index(fit_df["rank"], inplace=True)

        if plot:
            # Select best distribution
            best_dist = fit_df[fit_df["rank"] == fit_df["rank"].min()].reset_index(drop=True).iloc[[0]]
            best_dist_pos = int(best_dist["dist_pos"].values[0])
            best_dist_sel = candidate_distributions[best_dist_pos]
            params = torch.tensor(best_dist["params"][0]).reshape(1, -1)
            params = torch.split(params, best_dist_sel.M, dim=1)

            fitted_params = np.concatenate(
                [
                    response_fun(params[i]).numpy()
                    for i, (dist_param, response_fun) in enumerate(best_dist_sel.param_dict.items())
                ],
                axis=1,
            )

            fitted_params = pd.DataFrame(fitted_params, columns=best_dist_sel.distribution_arg_names)
            n_samples = np.max([10000, target.shape[0]])
            n_samples = np.where(n_samples > 500000, 100000, n_samples)
            dist_samples = best_dist_sel.draw_samples(fitted_params,
                                                      n_samples=n_samples,
                                                      seed=123).values

            # Plot actual and fitted distribution
            plt.figure(figsize=figure_size)
            sns.kdeplot(target.reshape(-1,), label="Actual")
            sns.kdeplot(dist_samples.reshape(-1,), label=f"Best-Fit: {best_dist['distribution'].values[0]}")
            plt.legend()
            plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
            plt.show()

        fit_df.drop(columns=["rank", "params", "dist_pos", "M"], inplace=True)

        return fit_df
calculate_start_values(target, max_iter=50)

Function that calculates the starting values for each distributional parameter.

Arguments

target: np.ndarray Data from which starting values are calculated. max_iter: int Maximum number of iterations.

Returns

loss: float Loss value. start_values: np.ndarray Starting values for each distributional parameter.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
def calculate_start_values(self,
                           target: np.ndarray,
                           max_iter: int = 50
                           ) -> Tuple[float, np.ndarray]:
    """
    Function that calculates the starting values for each distributional parameter.

    Arguments
    ---------
    target: np.ndarray
        Data from which starting values are calculated.
    max_iter: int
        Maximum number of iterations.

    Returns
    -------
    loss: float
        Loss value.
    start_values: np.ndarray
        Starting values for each distributional parameter.
    """
    # Convert target to torch.tensor
    target = torch.tensor(target, dtype=torch.float32).flatten()

    # Initialize parameters
    params = [torch.tensor(0.5, requires_grad=True) for _ in range(self.n_dist_param)]

    # Specify optimizer
    optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter/4), 20]), line_search_fn="strong_wolfe")

    # Define learning rate scheduler
    lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

    # Define closure
    def closure():
        optimizer.zero_grad()
        loss = self.loss_fn_start_values(params, target)
        loss.backward()
        return loss

    # Optimize parameters
    loss_vals = []
    tolerance = 1e-5
    patience = 5
    best_loss = float("inf")
    epochs_without_change = 0

    for epoch in range(max_iter):
        optimizer.zero_grad()
        loss = optimizer.step(closure)
        lr_scheduler.step(loss)
        loss_vals.append(loss.item())

        # Stopping criterion (no improvement in loss)
        if loss.item() < best_loss - tolerance:
            best_loss = loss.item()
            epochs_without_change = 0
        else:
            epochs_without_change += 1

        if epochs_without_change >= patience:
            break

    # Get final loss
    loss = np.array(loss_vals[-1])

    # Get start values
    start_values = np.array([params[i].detach() for i in range(self.n_dist_param)])

    # Replace any remaining NaNs or infinity values with 0.5
    start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

    return loss, start_values
compute_gradients_and_hessians(loss, predt, weights)

Calculates gradients and hessians.

Output gradients and hessians have shape (n_samples*n_outputs, 1).

Arguments:

loss: torch.Tensor Loss. predt: torch.Tensor List of predicted parameters. weights: np.ndarray Weights.

Returns:

grad: torch.Tensor Gradients. hess: torch.Tensor Hessians.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
def compute_gradients_and_hessians(self,
                                   loss: torch.Tensor,
                                   predt: List[torch.Tensor],
                                   weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

    """
    Calculates gradients and hessians.

    Output gradients and hessians have shape (n_samples*n_outputs, 1).

    Arguments:
    ---------
    loss: torch.Tensor
        Loss.
    predt: torch.Tensor
        List of predicted parameters.
    weights: np.ndarray
        Weights.

    Returns:
    -------
    grad: torch.Tensor
        Gradients.
    hess: torch.Tensor
        Hessians.
    """
    # Gradient and Hessian
    grad = autograd(loss, inputs=predt, create_graph=True)
    hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]

    # Stabilization of Derivatives
    if self.stabilization != "None":
        grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
        hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

    # Reshape
    grad = torch.cat(grad, axis=1).detach().squeeze(-1).numpy()
    hess = torch.cat(hess, axis=1).detach().squeeze(-1).numpy()

    # Weighting
    grad *= weights
    hess *= weights

    # Flatten
    grad = grad.flatten()
    hess = hess.flatten()

    return grad, hess
create_mixture_distribution(params)

Function that creates a mixture distribution.

Arguments

params: torch.Tensor Distributional parameters.

Returns

dist: torch.distributions.Distribution Mixture distribution.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
def create_mixture_distribution(self,
                                params: List[torch.Tensor],
                                ) -> torch.distributions.Distribution:
    """
    Function that creates a mixture distribution.

    Arguments
    ---------
    params: torch.Tensor
        Distributional parameters.

    Returns
    -------
    dist: torch.distributions.Distribution
        Mixture distribution.
    """

    # Create Mixture Distribution
    mixture_cat = Categorical(probs=params[-1])
    mixture_comp = self.distribution.distribution(*params[:-1])
    mixture_dist = MixtureSameFamily(mixture_cat, mixture_comp)

    return mixture_dist
dist_select(target, candidate_distributions, max_iter=100, plot=False, figure_size=(8, 5))

Function that selects the most suitable distribution among the candidate_distributions for the target variable, based on the NegLogLikelihood (lower is better).

Parameters

target: np.ndarray Response variable. candidate_distributions: List List of candidate distributions. max_iter: int Maximum number of iterations for the optimization. plot: bool If True, a density plot of the actual and fitted distribution is created. figure_size: tuple Figure size of the density plot.

Returns

fit_df: pd.DataFrame Dataframe with the loss values of the fitted candidate distributions.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
def dist_select(self,
                target: np.ndarray,
                candidate_distributions: List,
                max_iter: int = 100,
                plot: bool = False,
                figure_size: tuple = (8, 5),
                ) -> pd.DataFrame:
    """
    Function that selects the most suitable distribution among the candidate_distributions for the target variable,
    based on the NegLogLikelihood (lower is better).

    Parameters
    ----------
    target: np.ndarray
        Response variable.
    candidate_distributions: List
        List of candidate distributions.
    max_iter: int
        Maximum number of iterations for the optimization.
    plot: bool
        If True, a density plot of the actual and fitted distribution is created.
    figure_size: tuple
        Figure size of the density plot.

    Returns
    -------
    fit_df: pd.DataFrame
        Dataframe with the loss values of the fitted candidate distributions.
    """
    dist_list = []
    total_iterations = len(candidate_distributions)
    with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
        for i in range(len(candidate_distributions)):
            dist_name = candidate_distributions[i].distribution.__class__.__name__
            n_mix = candidate_distributions[i].M
            tau = candidate_distributions[i].temperature
            dist_name = f"Mixture({dist_name}, tau={tau}, M={n_mix})"
            pbar.set_description(f"Fitting {dist_name} distribution")
            try:
                loss, params = candidate_distributions[i].calculate_start_values(target=target, max_iter=max_iter)
                fit_df = pd.DataFrame.from_dict(
                    {candidate_distributions[i].loss_fn: loss.reshape(-1, ),
                     "distribution": str(dist_name),
                     "params": [params],
                     "dist_pos": i,
                     "M": candidate_distributions[i].M
                     }
                )
            except Exception as e:
                warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                fit_df = pd.DataFrame(
                    {candidate_distributions[i].loss_fn: np.nan,
                     "distribution": str(dist_name),
                     "params": [np.nan] * self.n_dist_param,
                     "dist_pos": i,
                     "M": candidate_distributions[i].M
                     }
                )
            dist_list.append(fit_df)
            pbar.update(1)
        pbar.set_description(f"Fitting of candidate distributions completed")
        fit_df = pd.concat(dist_list).sort_values(by=candidate_distributions[i].loss_fn, ascending=True)
        fit_df["rank"] = fit_df[candidate_distributions[i].loss_fn].rank().astype(int)
        fit_df.set_index(fit_df["rank"], inplace=True)

    if plot:
        # Select best distribution
        best_dist = fit_df[fit_df["rank"] == fit_df["rank"].min()].reset_index(drop=True).iloc[[0]]
        best_dist_pos = int(best_dist["dist_pos"].values[0])
        best_dist_sel = candidate_distributions[best_dist_pos]
        params = torch.tensor(best_dist["params"][0]).reshape(1, -1)
        params = torch.split(params, best_dist_sel.M, dim=1)

        fitted_params = np.concatenate(
            [
                response_fun(params[i]).numpy()
                for i, (dist_param, response_fun) in enumerate(best_dist_sel.param_dict.items())
            ],
            axis=1,
        )

        fitted_params = pd.DataFrame(fitted_params, columns=best_dist_sel.distribution_arg_names)
        n_samples = np.max([10000, target.shape[0]])
        n_samples = np.where(n_samples > 500000, 100000, n_samples)
        dist_samples = best_dist_sel.draw_samples(fitted_params,
                                                  n_samples=n_samples,
                                                  seed=123).values

        # Plot actual and fitted distribution
        plt.figure(figsize=figure_size)
        sns.kdeplot(target.reshape(-1,), label="Actual")
        sns.kdeplot(dist_samples.reshape(-1,), label=f"Best-Fit: {best_dist['distribution'].values[0]}")
        plt.legend()
        plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
        plt.show()

    fit_df.drop(columns=["rank", "params", "dist_pos", "M"], inplace=True)

    return fit_df
draw_samples(predt_params, n_samples=1000, seed=123)

Function that draws n_samples from a predicted distribution.

Arguments

predt_params: pd.DataFrame pd.DataFrame with predicted distributional parameters. n_samples: int Number of sample to draw from predicted response distribution. seed: int Manual seed.

Returns

pred_dist: pd.DataFrame DataFrame with n_samples drawn from predicted response distribution.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
def draw_samples(self,
                 predt_params: pd.DataFrame,
                 n_samples: int = 1000,
                 seed: int = 123
                 ) -> pd.DataFrame:
    """
    Function that draws n_samples from a predicted distribution.

    Arguments
    ---------
    predt_params: pd.DataFrame
        pd.DataFrame with predicted distributional parameters.
    n_samples: int
        Number of sample to draw from predicted response distribution.
    seed: int
        Manual seed.

    Returns
    -------
    pred_dist: pd.DataFrame
        DataFrame with n_samples drawn from predicted response distribution.

    """
    torch.manual_seed(seed)

    pred_params = torch.tensor(predt_params.values).reshape(-1, self.n_dist_param)
    pred_params = torch.split(pred_params, self.M, dim=1)
    dist_pred = self.create_mixture_distribution(pred_params)
    dist_samples = dist_pred.sample((n_samples,)).squeeze().detach().numpy().T
    dist_samples = pd.DataFrame(dist_samples)
    dist_samples.columns = [str("y_sample") + str(i) for i in range(dist_samples.shape[1])]

    if self.discrete:
        dist_samples = dist_samples.astype(int)

    return dist_samples
get_params_loss(predt, target, start_values, requires_grad=False)

Function that returns the predicted parameters and the loss.

Arguments

predt: np.ndarray Predicted values. target: torch.Tensor Target values. start_values: List Starting values for each distributional parameter. requires_grad: bool Whether to add to the computational graph or not.

Returns

predt: List of torch.Tensors Predicted parameters. loss: torch.Tensor Loss value.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
def get_params_loss(self,
                    predt: np.ndarray,
                    target: torch.Tensor,
                    start_values: List[float],
                    requires_grad: bool = False,
                    ) -> Tuple[List[torch.Tensor], np.ndarray]:
    """
    Function that returns the predicted parameters and the loss.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    target: torch.Tensor
        Target values.
    start_values: List
        Starting values for each distributional parameter.
    requires_grad: bool
        Whether to add to the computational graph or not.

    Returns
    -------
    predt: List of torch.Tensors
        Predicted parameters.
    loss: torch.Tensor
        Loss value.
    """
    # Predicted Parameters
    predt = predt.reshape(-1, self.n_dist_param)

    # Replace NaNs and infinity values with unconditional start values
    nan_inf_mask = np.isnan(predt) | np.isinf(predt)
    predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

    if self.hessian_mode == "grouped":
        # Convert to torch.Tensor: splits the parameters into tensors for each parameter-type
        predt = torch.split(torch.tensor(predt, requires_grad=requires_grad), self.M, dim=1)
        # Transform parameters to response scale
        predt_transformed = [response_fn(predt[i]) for i, response_fn in enumerate(self.param_dict.values())]

    else:
        # Convert to torch.Tensor: splits the parameters into tensors for each parameter individually
        predt = torch.split(torch.tensor(predt, requires_grad=requires_grad), 1, dim=1)
        # Transform parameters to response scale
        keys = list(self.param_dict.keys())
        max_index = len(self.param_dict) * self.M
        index_ranges = []
        for i in range(0, max_index, self.M):
            if i + self.M >= max_index:
                index_ranges.append((i, None))
                break
            index_ranges.append((i, i + self.M))

        predt_transformed = []
        for key, (start, end) in zip(keys, index_ranges):
            predt_transformed.append(self.param_dict[key](torch.cat(predt[start:end], dim=1)))

    # Specify Distribution and Loss
    dist_fit = self.create_mixture_distribution(predt_transformed)
    loss = -torch.nansum(dist_fit.log_prob(target))

    return predt, loss
loss_fn_start_values(params, target)

Function that calculates the loss for a given set of distributional parameters. Only used for calculating the loss for the start values.

Parameter

params: torch.Tensor Distributional parameters. target: torch.Tensor Target values.

Returns

loss: torch.Tensor Loss value.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
def loss_fn_start_values(self,
                         params: torch.Tensor,
                         target: torch.Tensor) -> torch.Tensor:
    """
    Function that calculates the loss for a given set of distributional parameters. Only used for calculating
    the loss for the start values.

    Parameter
    ---------
    params: torch.Tensor
        Distributional parameters.
    target: torch.Tensor
        Target values.

    Returns
    -------
    loss: torch.Tensor
        Loss value.
    """
    # Replace NaNs and infinity values with 0.5
    nan_inf_idx = torch.isnan(torch.stack(params)) | torch.isinf(torch.stack(params))
    params = torch.where(nan_inf_idx, torch.tensor(0.5), torch.stack(params)).reshape(1, -1)
    params = torch.split(params, self.M, dim=1)

    # Transform parameters to response scale
    params = [response_fn(params[i]) for i, response_fn in enumerate(self.param_dict.values())]

    # Specify Distribution and Loss
    dist = self.create_mixture_distribution(params)
    loss = -torch.nansum(dist.log_prob(target))

    return loss
metric_fn(predt, data)

Function that evaluates the predictions using the specified loss function.

Arguments

predt: np.ndarray Predicted values. data: xgb.DMatrix Data used for training.

Returns

name: str Name of the evaluation metric. loss: float Loss value.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
def metric_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[str, np.ndarray]:
    """
    Function that evaluates the predictions using the specified loss function.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: xgb.DMatrix
        Data used for training.

    Returns
    -------
    name: str
        Name of the evaluation metric.
    loss: float
        Loss value.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1), dtype=torch.float32)

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate loss
    _, loss = self.get_params_loss(predt, target.flatten(), start_values, requires_grad=False)

    return self.loss_fn, loss
objective_fn(predt, data)

Function to estimate gradients and hessians of distributional parameters.

Arguments

predt: np.ndarray Predicted values. data: xgb.DMatrix Data used for training.

Returns

grad: np.ndarray Gradient. hess: np.ndarray Hessian.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def objective_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:

    """
    Function to estimate gradients and hessians of distributional parameters.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: xgb.DMatrix
        Data used for training.

    Returns
    -------
    grad: np.ndarray
        Gradient.
    hess: np.ndarray
        Hessian.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1), dtype=torch.float32)

    # Weights
    if data.get_weight().size == 0:
        # Use 1 as weight if no weights are specified
        weights = np.ones_like(target, dtype="float32")
    else:
        weights = data.get_weight().reshape(-1, 1)

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate gradients and hessians
    predt, loss = self.get_params_loss(predt, target.flatten(), start_values, requires_grad=True)
    grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

    return grad, hess
predict_dist(booster, start_values, data, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

booster : xgb.Booster Trained model. start_values : np.ndarray Starting values for each distributional parameter. data : xgb.DMatrix Data to predict from. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

pred : pd.DataFrame Predictions.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
def predict_dist(self,
                 booster: xgb.Booster,
                 start_values: np.ndarray,
                 data: xgb.DMatrix,
                 pred_type: str = "parameters",
                 n_samples: int = 1000,
                 quantiles: list = [0.1, 0.5, 0.9],
                 seed: str = 123
                 ) -> pd.DataFrame:
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    booster : xgb.Booster
        Trained model.
    start_values : np.ndarray
        Starting values for each distributional parameter.
    data : xgb.DMatrix
        Data to predict from.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    pred : pd.DataFrame
        Predictions.
    """
    # Set base_margin as starting point for each distributional parameter. Requires base_score=0 in parameters.
    base_margin_predt = (np.ones(shape=(data.num_row(), 1))) * start_values
    data.set_base_margin(base_margin_predt.flatten())

    predt = np.array(booster.predict(data, output_margin=True)).reshape(-1, self.n_dist_param)
    predt = torch.split(torch.tensor(predt, dtype=torch.float32), self.M, dim=1)

    # Transform predicted parameters to response scale
    dist_params_predt = np.concatenate(
        [
            response_fun(predt[i]).numpy() for i, (dist_param, response_fun) in enumerate(self.param_dict.items())
        ],
        axis=1,
    )
    dist_params_predt = pd.DataFrame(dist_params_predt)
    dist_params_predt.columns = self.distribution_arg_names

    # Draw samples from predicted response distribution
    pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                        n_samples=n_samples,
                                        seed=seed)

    if pred_type == "parameters":
        return dist_params_predt

    elif pred_type == "samples":
        return pred_samples_df

    elif pred_type == "quantiles":
        # Calculate quantiles from predicted response distribution
        pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
        pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
        if self.discrete:
            pred_quant_df = pred_quant_df.astype(int)
        return pred_quant_df
stabilize_derivative(input_der, type='MAD')

Function that stabilizes Gradients and Hessians.

As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges, the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve convergence might be to standardize the response variable. This is especially useful if the range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the standardization of the response are not always advised but need to be carefully considered. Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

Parameters

input_der : torch.Tensor Input derivative, either Gradient or Hessian. type: str Stabilization method. Can be either "None", "MAD" or "L2".

Returns

stab_der : torch.Tensor Stabilized Gradient or Hessian.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
    """
    Function that stabilizes Gradients and Hessians.

    As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
    that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
    the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
    Another way to improve convergence might be to standardize the response variable. This is especially useful if the
    range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
    the standardization of the response are not always advised but need to be carefully considered.
    Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

    Parameters
    ----------
    input_der : torch.Tensor
        Input derivative, either Gradient or Hessian.
    type: str
        Stabilization method. Can be either "None", "MAD" or "L2".

    Returns
    -------
    stab_der : torch.Tensor
        Stabilized Gradient or Hessian.
    """

    if type == "MAD":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        stab_der = input_der / div

    if type == "L2":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.sqrt(torch.nanmean(input_der.pow(2)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
        stab_der = input_der / div

    if type == "None":
        stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

    return stab_der

get_component_distributions()

Function that returns component distributions for creating a mixing distribution.

Arguments

None

Returns

distns: List List of all available distributions.

Source code in xgboostlss/distributions/mixture_distribution_utils.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def get_component_distributions():
    """
    Function that returns component distributions for creating a mixing distribution.

    Arguments
    ---------
    None

    Returns
    -------
    distns: List
        List of all available distributions.
    """
    # Get all distribution names
    mixture_distns = [dist for dist in dir(distributions) if dist[0].isupper()]

    # Remove specific distributions
    distns_remove = [
        "Dirichlet",
        "Expectile",
        "MVN",
        "MVN_LoRa",
        "MVT",
        "Mixture",
        "SplineFlow"
    ]

    mixture_distns = [item for item in mixture_distns if item not in distns_remove]

    return mixture_distns

multivariate_distribution_utils

Multivariate_DistributionClass

Generic class that contains general functions for multivariate distributions.

Arguments

distribution: torch.distributions.Distribution PyTorch Distribution class. univariate: bool Whether the distribution is univariate or multivariate. distribution_arg_names: List List of distributional parameter names. n_targets: int Number of targets. rank: Optional[int] Rank of the low-rank form of the covariance matrix. n_dist_param: int Number of distributional parameters. param_dict: Dict[str, Any] Dictionary that maps distributional parameters to their response scale. param_transform: Callable Function that transforms the distributional parameters into the required format. get_dist_params: Callable Function that returns the distributional parameters. discrete: bool Whether the support of the distribution is discrete or continuous. stabilization: str Stabilization method. loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
class Multivariate_DistributionClass:
    """
    Generic class that contains general functions for multivariate distributions.

    Arguments
    ---------
    distribution: torch.distributions.Distribution
        PyTorch Distribution class.
    univariate: bool
        Whether the distribution is univariate or multivariate.
    distribution_arg_names: List
        List of distributional parameter names.
    n_targets: int
        Number of targets.
    rank: Optional[int]
        Rank of the low-rank form of the covariance matrix.
    n_dist_param: int
        Number of distributional parameters.
    param_dict: Dict[str, Any]
        Dictionary that maps distributional parameters to their response scale.
    param_transform: Callable
        Function that transforms the distributional parameters into the required format.
    get_dist_params: Callable
        Function that returns the distributional parameters.
    discrete: bool
        Whether the support of the distribution is discrete or continuous.
    stabilization: str
        Stabilization method.
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 distribution: torch.distributions.Distribution = None,
                 univariate: bool = False,
                 distribution_arg_names: List = None,
                 n_targets: int = 2,
                 rank: Optional[int] = None,
                 n_dist_param: int = None,
                 param_dict: Dict[str, Any] = None,
                 param_transform: Callable = None,
                 get_dist_params: Callable = None,
                 discrete: bool = False,
                 stabilization: str = "None",
                 loss_fn: str = "nll",
                 ):

        self.distribution = distribution
        self.univariate = univariate
        self.distribution_arg_names = distribution_arg_names
        self.n_targets = n_targets
        self.rank = rank
        self.n_dist_param = n_dist_param
        self.param_dict = param_dict
        self.param_transform = param_transform
        self.get_dist_params = get_dist_params
        self.discrete = discrete
        self.stabilization = stabilization
        self.loss_fn = loss_fn

    def objective_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:

        """
        Function to estimate gradients and hessians of distributional parameters.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: xgb.DMatrix
            Data used for training.

        Returns
        -------
        grad: np.ndarray
            Gradient.
        hess: np.ndarray
            Hessian.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, self.n_dist_param))[:, :self.n_targets]

        # Weights
        if data.get_weight().size == 0:
            # Use 1 as weight if no weights are specified
            weights = torch.ones_like(target[:, 0], dtype=target.dtype).numpy().reshape(-1, 1)
        else:
            weights = data.get_weight().reshape(-1, 1)

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate gradients and hessians
        predt, loss = self.get_params_loss(predt, target, start_values, requires_grad=True)
        grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

        return grad, hess

    def metric_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[str, np.ndarray]:
        """
        Function that evaluates the predictions using the specified loss function.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: xgb.DMatrix
            Data used for training.

        Returns
        -------
        name: str
            Name of the evaluation metric.
        loss: float
            Loss value.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, self.n_dist_param))[:, :self.n_targets]

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate loss
        _, loss = self.get_params_loss(predt, target, start_values, requires_grad=False)

        return self.loss_fn, loss

    def loss_fn_start_values(self,
                             params: torch.Tensor,
                             target: torch.Tensor) -> torch.Tensor:
        """
        Function that calculates the loss for a given set of distributional parameters. Only used for calculating
        the loss for the start values.

        Parameter
        ---------
        params: torch.Tensor
            Distributional parameters.
        target: torch.Tensor
            Target values.

        Returns
        -------
        loss: torch.Tensor
            Loss value.
        """
        # Replace NaNs and infinity values with 0.5
        params = [
            torch.where(torch.isnan(tensor) | torch.isinf(tensor), torch.tensor(0.5), tensor) for tensor in params
        ]

        # Transform parameters to response scale
        params = self.param_transform(params, self.param_dict, self.n_targets, rank=self.rank, n_obs=1)

        # Specify Distribution and Loss
        if self.distribution.__name__ == "Dirichlet":
            dist_kwargs = dict(zip(self.distribution_arg_names, [params]))
        else:
            dist_kwargs = dict(zip(self.distribution_arg_names, params))
        dist_fit = self.distribution(**dist_kwargs)
        loss = -torch.nansum(dist_fit.log_prob(target))

        return loss

    def calculate_start_values(self,
                               target: np.ndarray,
                               max_iter: int = 50
                               ) -> Tuple[float, np.ndarray]:
        """
        Function that calculates the starting values for each distributional parameter.

        Arguments
        ---------
        target: np.ndarray
            Data from which starting values are calculated.
        max_iter: int
            Maximum number of iterations.

        Returns
        -------
        loss: float
            Loss value.
        start_values: np.ndarray
            Starting values for each distributional parameter.
        """
        # Convert target to torch.tensor
        target = torch.tensor(target.reshape(-1, self.n_dist_param))[:, :self.n_targets]

        # Initialize parameters
        params = [
            torch.tensor(0.5, dtype=torch.float64).reshape(-1, 1).requires_grad_(True) for _ in range(self.n_dist_param)
        ]

        # Specify optimizer
        optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter/4), 20]), line_search_fn="strong_wolfe")

        # Define learning rate scheduler
        lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

        # Define closure
        def closure():
            optimizer.zero_grad()
            loss = self.loss_fn_start_values(params, target)
            loss.backward()
            return loss

        # Optimize parameters
        loss_vals = []
        for epoch in range(max_iter):
            loss = optimizer.step(closure)
            lr_scheduler.step(loss)
            loss_vals.append(loss.item())

        # Get final loss
        loss = np.array(loss_vals[-1])

        # Get start values
        start_values = np.array([params[i][0].detach().numpy() for i in range(self.n_dist_param)])

        # Replace any remaining NaNs or infinity values with 0.5
        start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5).reshape(-1,)

        return loss, start_values

    def get_params_loss(self,
                        predt: np.ndarray,
                        target: torch.Tensor,
                        start_values: List[float],
                        requires_grad: bool = False,
                        ) -> Tuple[List[torch.Tensor], np.ndarray]:
        """
        Function that returns the predicted parameters and the loss.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        target: torch.Tensor
            Target values.
        start_values: List
            Starting values for each distributional parameter.
        requires_grad: bool
            Whether to add to the computational graph or not.

        Returns
        -------
        predt: torch.Tensor
            Predicted parameters.
        loss: torch.Tensor
            Loss value.
        """
        # Number of observations
        n_obs = target.shape[0]

        # Predicted Parameters
        predt = predt.reshape(-1, self.n_dist_param)

        # Replace NaNs and infinity values with unconditional start values
        nan_inf_mask = np.isnan(predt) | np.isinf(predt)
        predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

        # Convert to torch.tensor
        predt = [
            torch.tensor(predt[:, i].reshape(-1, 1), requires_grad=requires_grad) for i in range(self.n_dist_param)
        ]

        # Predicted Parameters transformed to response scale
        predt_transformed = self.param_transform(predt, self.param_dict, self.n_targets, rank=self.rank, n_obs=n_obs)

        # Specify Distribution and Loss
        if self.distribution.__name__ == "Dirichlet":
            dist_kwargs = dict(zip(self.distribution_arg_names, [predt_transformed]))
        else:
            dist_kwargs = dict(zip(self.distribution_arg_names, predt_transformed))
        dist_fit = self.distribution(**dist_kwargs)
        loss = -torch.nansum(dist_fit.log_prob(target))

        return predt, loss

    def draw_samples(self,
                     dist_pred: torch.distributions.Distribution,
                     n_samples: int = 1000,
                     seed: int = 123
                     ) -> pd.DataFrame:
        """
        Function that draws n_samples from a predicted distribution.

        Arguments
        ---------
        dist_pred: torch.distributions.Distribution
            Predicted distribution.
        n_samples: int
            Number of sample to draw from predicted response distribution.
        seed: int
            Manual seed.

        Returns
        -------
        pred_dist: pd.DataFrame
            DataFrame with n_samples drawn from predicted response distribution.

        """
        torch.manual_seed(seed)
        dist_samples = dist_pred.sample((n_samples,)).detach().numpy().T
        if self.discrete:
            dist_samples = dist_samples.astype(int)

        samples_list = []
        for i in range(self.n_targets):
            target_df = pd.DataFrame.from_dict({"target": [f"y{i + 1}" for _ in range(dist_samples.shape[1])]})
            df_samples = pd.DataFrame(dist_samples[i, :])
            df_samples.columns = [str("y_sample") + str(i) for i in range(n_samples)]
            samples_list.append(pd.concat([target_df, df_samples], axis=1))

        samples_df = pd.concat(samples_list, axis=0).reset_index(drop=True)

        return samples_df

    def predict_dist(self,
                     booster: xgb.Booster,
                     start_values: np.ndarray,
                     data: xgb.DMatrix,
                     pred_type: str = "parameters",
                     n_samples: int = 1000,
                     quantiles: list = [0.1, 0.5, 0.9],
                     seed: str = 123
                     ) -> pd.DataFrame:
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        booster : xgb.Booster
            Trained model.
        start_values : np.ndarray
            Starting values for each distributional parameter.
        data : xgb.DMatrix
            Data to predict from.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
            - "expectiles" returns the predicted expectiles.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        pred : pd.DataFrame
            Predictions.
        """
        # Set base_margin as starting point for each distributional parameter. Requires base_score=0 in parameters.
        base_margin_pred = (np.ones(shape=(data.num_row(), 1))) * start_values
        data.set_base_margin(base_margin_pred.flatten())

        # Predict from model
        n_obs = data.num_row()
        predt = np.array(booster.predict(data, output_margin=True)).reshape(-1, self.n_dist_param)
        predt = [torch.tensor(predt[:, i].reshape(-1, 1), dtype=torch.float32) for i in range(self.n_dist_param)]
        dist_params_predt = self.param_transform(predt, self.param_dict, self.n_targets, rank=self.rank, n_obs=n_obs)

        # Predicted Distributional Parameters
        if self.distribution.__name__ == "Dirichlet":
            dist_kwargs = dict(zip(self.distribution_arg_names, [dist_params_predt]))
        else:
            dist_kwargs = dict(zip(self.distribution_arg_names, dist_params_predt))
        dist_pred = self.distribution(**dist_kwargs)

        # Draw samples from predicted response distribution
        pred_samples_df = self.draw_samples(dist_pred=dist_pred, n_samples=n_samples, seed=seed)

        # Get predicted distributional parameters
        predt_params_df = self.get_dist_params(n_targets=self.n_targets, dist_pred=dist_pred)

        if pred_type == "parameters":
            return predt_params_df

        elif pred_type == "samples":
            return pred_samples_df

        elif pred_type == "quantiles":
            # Calculate quantiles from predicted response distribution
            targets = pred_samples_df["target"]
            pred_quant_df = pred_samples_df.drop(columns="target")
            pred_quant_df = pred_quant_df.quantile(quantiles, axis=1).T
            pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
            if self.discrete:
                pred_quant_df = pred_quant_df.astype(int)
            pred_quant_df = pd.concat([targets, pred_quant_df], axis=1)

            return pred_quant_df

    def compute_gradients_and_hessians(self,
                                       loss: torch.tensor,
                                       predt: torch.tensor,
                                       weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

        """
        Calculates gradients and hessians.

        Output gradients and hessians have shape (n_samples*n_outputs, 1).

        Arguments:
        ---------
        loss: torch.Tensor
            Loss.
        predt: torch.Tensor
            List of predicted parameters.
        weights: np.ndarray
            Weights.

        Returns:
        -------
        grad: torch.Tensor
            Gradients.
        hess: torch.Tensor
            Hessians.
        """
        # Calculate gradients and hessians
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]

        # Stabilization of Derivatives
        if self.stabilization != "None":
            grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
            hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

        # Reshape
        grad = torch.cat(grad, axis=1).detach().numpy()
        hess = torch.cat(hess, axis=1).detach().numpy()

        # Weighting
        grad *= weights
        hess *= weights

        # Flatten
        grad = grad.flatten()
        hess = hess.flatten()

        return grad, hess

    def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
        """
        Function that stabilizes Gradients and Hessians.

        As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
        that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
        the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
        Another way to improve convergence might be to standardize the response variable. This is especially useful if the
        range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
        the standardization of the response are not always advised but need to be carefully considered.
        Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

        Parameters
        ----------
        input_der : torch.Tensor
            Input derivative, either Gradient or Hessian.
        type: str
            Stabilization method. Can be either "None", "MAD" or "L2".

        Returns
        -------
        stab_der : torch.Tensor
            Stabilized Gradient or Hessian.
        """

        if type == "MAD":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nansum(input_der)))
            div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            stab_der = input_der / div

        if type == "L2":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nansum(input_der)))
            div = torch.sqrt(torch.nansum(input_der.pow(2)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
            stab_der = input_der / div

        if type == "None":
            stab_der = torch.nan_to_num(input_der, nan=float(torch.nansum(input_der)))

        return stab_der


    def dist_select(self,
                    target: np.ndarray,
                    candidate_distributions: List,
                    max_iter: int = 100,
                    plot: bool = False,
                    ncol: int = 3,
                    height: float = 4,
                    sharex: bool = True,
                    sharey: bool = True,
                    ) -> pd.DataFrame:
        """
        Function that selects the most suitable distribution among the candidate_distributions for the target variable,
        based on the NegLogLikelihood (lower is better).

        Parameters
        ----------
        target: np.ndarray
            Response variable.
        candidate_distributions: List
            List of candidate distributions.
        max_iter: int
            Maximum number of iterations for the optimization.
        plot: bool
            If True, a density plot of the actual and fitted distribution is created.
        ncol: int
            Number of columns for the facetting of the density plots.
        height: Float
            Height (in inches) of each facet.
        sharex: bool
            Whether to share the x-axis across the facets.
        sharey: bool
            Whether to share the y-axis across the facets.

        Returns
        -------
        fit_df: pd.DataFrame
            Dataframe with the loss values of the fitted candidate distributions.
        """
        dist_list = []
        total_iterations = len(candidate_distributions)

        with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
            for i in range(len(candidate_distributions)):
                dist_name = candidate_distributions[i].__class__.__name__
                if dist_name == "MVN_LoRa":
                    dist_name = dist_name + f"(rank={candidate_distributions[i].rank})"
                pbar.set_description(f"Fitting {dist_name} distribution")
                dist_sel = candidate_distributions[i]
                target_expand = dist_sel.target_append(target, dist_sel.n_targets, dist_sel.n_dist_param)
                try:
                    loss, params = dist_sel.calculate_start_values(target=target_expand, max_iter=max_iter)
                    fit_df = pd.DataFrame.from_dict(
                        {dist_sel.loss_fn: loss.reshape(-1,),
                         "distribution": str(dist_name),
                         "params": [params]
                         }
                    )
                except Exception as e:
                    warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                    fit_df = pd.DataFrame(
                        {dist_sel.loss_fn: np.nan,
                         "distribution": str(dist_name),
                         "params": [np.nan] * dist_sel.n_dist_param
                        }
                    )
                dist_list.append(fit_df)
                pbar.update(1)
            pbar.set_description(f"Fitting of candidate distributions completed")
            fit_df = pd.concat(dist_list).sort_values(by=dist_sel.loss_fn, ascending=True)
            fit_df["rank"] = fit_df[dist_sel.loss_fn].rank().astype(int)
            fit_df.set_index(fit_df["rank"], inplace=True)
        if plot:
            warnings.simplefilter(action='ignore', category=UserWarning)
            # Select distribution
            best_dist = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
            for dist in candidate_distributions:
                dist_name = dist.__class__.__name__
                if dist_name == "MVN_LoRa":
                    dist_name = dist_name + f"(rank={dist.rank})"
                if dist_name == best_dist["distribution"].values[0]:
                    best_dist_sel = dist
                    break

            # Draw samples from distribution
            dist_params = [
                torch.tensor(best_dist["params"][0][i].reshape(-1, 1)) for i in range(best_dist_sel.n_dist_param)
            ]
            dist_params = best_dist_sel.param_transform(dist_params,
                                                        best_dist_sel.param_dict,
                                                        n_targets=best_dist_sel.n_targets,
                                                        rank=best_dist_sel.rank,
                                                        n_obs=1)

            if best_dist["distribution"][0] == "Dirichlet":
                dist_kwargs = dict(zip(best_dist_sel.distribution_arg_names, [dist_params]))
            else:
                dist_kwargs = dict(zip(best_dist_sel.distribution_arg_names, dist_params))
            dist_fit = best_dist_sel.distribution(**dist_kwargs)
            n_samples = np.max([1000, target.shape[0]])
            n_samples = np.where(n_samples > 10000, 1000, n_samples)
            df_samples = best_dist_sel.draw_samples(dist_fit, n_samples=n_samples, seed=123)

            # Plot actual and fitted distribution
            df_samples["type"] = f"Best-Fit: {best_dist['distribution'].values[0]}"
            df_samples = df_samples.melt(id_vars=["target", "type"]).drop(columns="variable")

            df_actual = pd.DataFrame(target)
            df_actual.columns = [f"y{i + 1}" for i in range(best_dist_sel.n_targets)]
            df_actual["type"] = "Actual"
            df_actual = df_actual.melt(id_vars="type", var_name="target")[df_samples.columns]

            plot_df = pd.concat([df_actual, df_samples])

            g = sns.FacetGrid(plot_df,
                              col="target",
                              hue="type",
                              col_wrap=ncol,
                              height=height,
                              sharex=sharex,
                              sharey=sharey,
                              )
            g.map(sns.kdeplot, "value", lw=2.5)
            handles, labels = g.axes[0].get_legend_handles_labels()
            g.fig.legend(handles, labels, loc='upper center', ncol=len(labels), title="", bbox_to_anchor=(0.5, 0.92))
            g.fig.suptitle("Actual vs. Best-Fit Density", weight="bold", fontsize=16)
            g.fig.tight_layout(rect=[0, 0, 1, 0.9])

        fit_df.drop(columns=["rank", "params"], inplace=True)

        return fit_df

    def target_append(self,
                      target: np.ndarray,
                      n_targets: int,
                      n_dist_param: int
                      ) -> np.ndarray:
        """
        Function that appends target to the number of specified parameters.

        Arguments
        ---------
        target: np.ndarray
            Target variables.
        n_targets: int
            Number of targets.
        n_dist_param: int
            Number of distribution parameters.

        Returns
        -------
        label: np.ndarray
            Array with appended targets.
        """
        label = target.reshape(-1, n_targets)
        n_obs = label.shape[0]
        n_fill = n_dist_param - n_targets
        np_fill = np.ones((n_obs, n_fill))
        label_append = np.concatenate([label, np_fill], axis=1).reshape(-1, n_dist_param)

        return label_append
calculate_start_values(target, max_iter=50)

Function that calculates the starting values for each distributional parameter.

Arguments

target: np.ndarray Data from which starting values are calculated. max_iter: int Maximum number of iterations.

Returns

loss: float Loss value. start_values: np.ndarray Starting values for each distributional parameter.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
def calculate_start_values(self,
                           target: np.ndarray,
                           max_iter: int = 50
                           ) -> Tuple[float, np.ndarray]:
    """
    Function that calculates the starting values for each distributional parameter.

    Arguments
    ---------
    target: np.ndarray
        Data from which starting values are calculated.
    max_iter: int
        Maximum number of iterations.

    Returns
    -------
    loss: float
        Loss value.
    start_values: np.ndarray
        Starting values for each distributional parameter.
    """
    # Convert target to torch.tensor
    target = torch.tensor(target.reshape(-1, self.n_dist_param))[:, :self.n_targets]

    # Initialize parameters
    params = [
        torch.tensor(0.5, dtype=torch.float64).reshape(-1, 1).requires_grad_(True) for _ in range(self.n_dist_param)
    ]

    # Specify optimizer
    optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter/4), 20]), line_search_fn="strong_wolfe")

    # Define learning rate scheduler
    lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

    # Define closure
    def closure():
        optimizer.zero_grad()
        loss = self.loss_fn_start_values(params, target)
        loss.backward()
        return loss

    # Optimize parameters
    loss_vals = []
    for epoch in range(max_iter):
        loss = optimizer.step(closure)
        lr_scheduler.step(loss)
        loss_vals.append(loss.item())

    # Get final loss
    loss = np.array(loss_vals[-1])

    # Get start values
    start_values = np.array([params[i][0].detach().numpy() for i in range(self.n_dist_param)])

    # Replace any remaining NaNs or infinity values with 0.5
    start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5).reshape(-1,)

    return loss, start_values
compute_gradients_and_hessians(loss, predt, weights)

Calculates gradients and hessians.

Output gradients and hessians have shape (n_samples*n_outputs, 1).

Arguments:

loss: torch.Tensor Loss. predt: torch.Tensor List of predicted parameters. weights: np.ndarray Weights.

Returns:

grad: torch.Tensor Gradients. hess: torch.Tensor Hessians.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
def compute_gradients_and_hessians(self,
                                   loss: torch.tensor,
                                   predt: torch.tensor,
                                   weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

    """
    Calculates gradients and hessians.

    Output gradients and hessians have shape (n_samples*n_outputs, 1).

    Arguments:
    ---------
    loss: torch.Tensor
        Loss.
    predt: torch.Tensor
        List of predicted parameters.
    weights: np.ndarray
        Weights.

    Returns:
    -------
    grad: torch.Tensor
        Gradients.
    hess: torch.Tensor
        Hessians.
    """
    # Calculate gradients and hessians
    grad = autograd(loss, inputs=predt, create_graph=True)
    hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]

    # Stabilization of Derivatives
    if self.stabilization != "None":
        grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
        hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

    # Reshape
    grad = torch.cat(grad, axis=1).detach().numpy()
    hess = torch.cat(hess, axis=1).detach().numpy()

    # Weighting
    grad *= weights
    hess *= weights

    # Flatten
    grad = grad.flatten()
    hess = hess.flatten()

    return grad, hess
dist_select(target, candidate_distributions, max_iter=100, plot=False, ncol=3, height=4, sharex=True, sharey=True)

Function that selects the most suitable distribution among the candidate_distributions for the target variable, based on the NegLogLikelihood (lower is better).

Parameters

target: np.ndarray Response variable. candidate_distributions: List List of candidate distributions. max_iter: int Maximum number of iterations for the optimization. plot: bool If True, a density plot of the actual and fitted distribution is created. ncol: int Number of columns for the facetting of the density plots. height: Float Height (in inches) of each facet. sharex: bool Whether to share the x-axis across the facets. sharey: bool Whether to share the y-axis across the facets.

Returns

fit_df: pd.DataFrame Dataframe with the loss values of the fitted candidate distributions.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
def dist_select(self,
                target: np.ndarray,
                candidate_distributions: List,
                max_iter: int = 100,
                plot: bool = False,
                ncol: int = 3,
                height: float = 4,
                sharex: bool = True,
                sharey: bool = True,
                ) -> pd.DataFrame:
    """
    Function that selects the most suitable distribution among the candidate_distributions for the target variable,
    based on the NegLogLikelihood (lower is better).

    Parameters
    ----------
    target: np.ndarray
        Response variable.
    candidate_distributions: List
        List of candidate distributions.
    max_iter: int
        Maximum number of iterations for the optimization.
    plot: bool
        If True, a density plot of the actual and fitted distribution is created.
    ncol: int
        Number of columns for the facetting of the density plots.
    height: Float
        Height (in inches) of each facet.
    sharex: bool
        Whether to share the x-axis across the facets.
    sharey: bool
        Whether to share the y-axis across the facets.

    Returns
    -------
    fit_df: pd.DataFrame
        Dataframe with the loss values of the fitted candidate distributions.
    """
    dist_list = []
    total_iterations = len(candidate_distributions)

    with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
        for i in range(len(candidate_distributions)):
            dist_name = candidate_distributions[i].__class__.__name__
            if dist_name == "MVN_LoRa":
                dist_name = dist_name + f"(rank={candidate_distributions[i].rank})"
            pbar.set_description(f"Fitting {dist_name} distribution")
            dist_sel = candidate_distributions[i]
            target_expand = dist_sel.target_append(target, dist_sel.n_targets, dist_sel.n_dist_param)
            try:
                loss, params = dist_sel.calculate_start_values(target=target_expand, max_iter=max_iter)
                fit_df = pd.DataFrame.from_dict(
                    {dist_sel.loss_fn: loss.reshape(-1,),
                     "distribution": str(dist_name),
                     "params": [params]
                     }
                )
            except Exception as e:
                warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                fit_df = pd.DataFrame(
                    {dist_sel.loss_fn: np.nan,
                     "distribution": str(dist_name),
                     "params": [np.nan] * dist_sel.n_dist_param
                    }
                )
            dist_list.append(fit_df)
            pbar.update(1)
        pbar.set_description(f"Fitting of candidate distributions completed")
        fit_df = pd.concat(dist_list).sort_values(by=dist_sel.loss_fn, ascending=True)
        fit_df["rank"] = fit_df[dist_sel.loss_fn].rank().astype(int)
        fit_df.set_index(fit_df["rank"], inplace=True)
    if plot:
        warnings.simplefilter(action='ignore', category=UserWarning)
        # Select distribution
        best_dist = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
        for dist in candidate_distributions:
            dist_name = dist.__class__.__name__
            if dist_name == "MVN_LoRa":
                dist_name = dist_name + f"(rank={dist.rank})"
            if dist_name == best_dist["distribution"].values[0]:
                best_dist_sel = dist
                break

        # Draw samples from distribution
        dist_params = [
            torch.tensor(best_dist["params"][0][i].reshape(-1, 1)) for i in range(best_dist_sel.n_dist_param)
        ]
        dist_params = best_dist_sel.param_transform(dist_params,
                                                    best_dist_sel.param_dict,
                                                    n_targets=best_dist_sel.n_targets,
                                                    rank=best_dist_sel.rank,
                                                    n_obs=1)

        if best_dist["distribution"][0] == "Dirichlet":
            dist_kwargs = dict(zip(best_dist_sel.distribution_arg_names, [dist_params]))
        else:
            dist_kwargs = dict(zip(best_dist_sel.distribution_arg_names, dist_params))
        dist_fit = best_dist_sel.distribution(**dist_kwargs)
        n_samples = np.max([1000, target.shape[0]])
        n_samples = np.where(n_samples > 10000, 1000, n_samples)
        df_samples = best_dist_sel.draw_samples(dist_fit, n_samples=n_samples, seed=123)

        # Plot actual and fitted distribution
        df_samples["type"] = f"Best-Fit: {best_dist['distribution'].values[0]}"
        df_samples = df_samples.melt(id_vars=["target", "type"]).drop(columns="variable")

        df_actual = pd.DataFrame(target)
        df_actual.columns = [f"y{i + 1}" for i in range(best_dist_sel.n_targets)]
        df_actual["type"] = "Actual"
        df_actual = df_actual.melt(id_vars="type", var_name="target")[df_samples.columns]

        plot_df = pd.concat([df_actual, df_samples])

        g = sns.FacetGrid(plot_df,
                          col="target",
                          hue="type",
                          col_wrap=ncol,
                          height=height,
                          sharex=sharex,
                          sharey=sharey,
                          )
        g.map(sns.kdeplot, "value", lw=2.5)
        handles, labels = g.axes[0].get_legend_handles_labels()
        g.fig.legend(handles, labels, loc='upper center', ncol=len(labels), title="", bbox_to_anchor=(0.5, 0.92))
        g.fig.suptitle("Actual vs. Best-Fit Density", weight="bold", fontsize=16)
        g.fig.tight_layout(rect=[0, 0, 1, 0.9])

    fit_df.drop(columns=["rank", "params"], inplace=True)

    return fit_df
draw_samples(dist_pred, n_samples=1000, seed=123)

Function that draws n_samples from a predicted distribution.

Arguments

dist_pred: torch.distributions.Distribution Predicted distribution. n_samples: int Number of sample to draw from predicted response distribution. seed: int Manual seed.

Returns

pred_dist: pd.DataFrame DataFrame with n_samples drawn from predicted response distribution.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
def draw_samples(self,
                 dist_pred: torch.distributions.Distribution,
                 n_samples: int = 1000,
                 seed: int = 123
                 ) -> pd.DataFrame:
    """
    Function that draws n_samples from a predicted distribution.

    Arguments
    ---------
    dist_pred: torch.distributions.Distribution
        Predicted distribution.
    n_samples: int
        Number of sample to draw from predicted response distribution.
    seed: int
        Manual seed.

    Returns
    -------
    pred_dist: pd.DataFrame
        DataFrame with n_samples drawn from predicted response distribution.

    """
    torch.manual_seed(seed)
    dist_samples = dist_pred.sample((n_samples,)).detach().numpy().T
    if self.discrete:
        dist_samples = dist_samples.astype(int)

    samples_list = []
    for i in range(self.n_targets):
        target_df = pd.DataFrame.from_dict({"target": [f"y{i + 1}" for _ in range(dist_samples.shape[1])]})
        df_samples = pd.DataFrame(dist_samples[i, :])
        df_samples.columns = [str("y_sample") + str(i) for i in range(n_samples)]
        samples_list.append(pd.concat([target_df, df_samples], axis=1))

    samples_df = pd.concat(samples_list, axis=0).reset_index(drop=True)

    return samples_df
get_params_loss(predt, target, start_values, requires_grad=False)

Function that returns the predicted parameters and the loss.

Arguments

predt: np.ndarray Predicted values. target: torch.Tensor Target values. start_values: List Starting values for each distributional parameter. requires_grad: bool Whether to add to the computational graph or not.

Returns

predt: torch.Tensor Predicted parameters. loss: torch.Tensor Loss value.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
def get_params_loss(self,
                    predt: np.ndarray,
                    target: torch.Tensor,
                    start_values: List[float],
                    requires_grad: bool = False,
                    ) -> Tuple[List[torch.Tensor], np.ndarray]:
    """
    Function that returns the predicted parameters and the loss.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    target: torch.Tensor
        Target values.
    start_values: List
        Starting values for each distributional parameter.
    requires_grad: bool
        Whether to add to the computational graph or not.

    Returns
    -------
    predt: torch.Tensor
        Predicted parameters.
    loss: torch.Tensor
        Loss value.
    """
    # Number of observations
    n_obs = target.shape[0]

    # Predicted Parameters
    predt = predt.reshape(-1, self.n_dist_param)

    # Replace NaNs and infinity values with unconditional start values
    nan_inf_mask = np.isnan(predt) | np.isinf(predt)
    predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

    # Convert to torch.tensor
    predt = [
        torch.tensor(predt[:, i].reshape(-1, 1), requires_grad=requires_grad) for i in range(self.n_dist_param)
    ]

    # Predicted Parameters transformed to response scale
    predt_transformed = self.param_transform(predt, self.param_dict, self.n_targets, rank=self.rank, n_obs=n_obs)

    # Specify Distribution and Loss
    if self.distribution.__name__ == "Dirichlet":
        dist_kwargs = dict(zip(self.distribution_arg_names, [predt_transformed]))
    else:
        dist_kwargs = dict(zip(self.distribution_arg_names, predt_transformed))
    dist_fit = self.distribution(**dist_kwargs)
    loss = -torch.nansum(dist_fit.log_prob(target))

    return predt, loss
loss_fn_start_values(params, target)

Function that calculates the loss for a given set of distributional parameters. Only used for calculating the loss for the start values.

Parameter

params: torch.Tensor Distributional parameters. target: torch.Tensor Target values.

Returns

loss: torch.Tensor Loss value.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
def loss_fn_start_values(self,
                         params: torch.Tensor,
                         target: torch.Tensor) -> torch.Tensor:
    """
    Function that calculates the loss for a given set of distributional parameters. Only used for calculating
    the loss for the start values.

    Parameter
    ---------
    params: torch.Tensor
        Distributional parameters.
    target: torch.Tensor
        Target values.

    Returns
    -------
    loss: torch.Tensor
        Loss value.
    """
    # Replace NaNs and infinity values with 0.5
    params = [
        torch.where(torch.isnan(tensor) | torch.isinf(tensor), torch.tensor(0.5), tensor) for tensor in params
    ]

    # Transform parameters to response scale
    params = self.param_transform(params, self.param_dict, self.n_targets, rank=self.rank, n_obs=1)

    # Specify Distribution and Loss
    if self.distribution.__name__ == "Dirichlet":
        dist_kwargs = dict(zip(self.distribution_arg_names, [params]))
    else:
        dist_kwargs = dict(zip(self.distribution_arg_names, params))
    dist_fit = self.distribution(**dist_kwargs)
    loss = -torch.nansum(dist_fit.log_prob(target))

    return loss
metric_fn(predt, data)

Function that evaluates the predictions using the specified loss function.

Arguments

predt: np.ndarray Predicted values. data: xgb.DMatrix Data used for training.

Returns

name: str Name of the evaluation metric. loss: float Loss value.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
def metric_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[str, np.ndarray]:
    """
    Function that evaluates the predictions using the specified loss function.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: xgb.DMatrix
        Data used for training.

    Returns
    -------
    name: str
        Name of the evaluation metric.
    loss: float
        Loss value.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, self.n_dist_param))[:, :self.n_targets]

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate loss
    _, loss = self.get_params_loss(predt, target, start_values, requires_grad=False)

    return self.loss_fn, loss
objective_fn(predt, data)

Function to estimate gradients and hessians of distributional parameters.

Arguments

predt: np.ndarray Predicted values. data: xgb.DMatrix Data used for training.

Returns

grad: np.ndarray Gradient. hess: np.ndarray Hessian.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
def objective_fn(self, predt: np.ndarray, data: xgb.DMatrix) -> Tuple[np.ndarray, np.ndarray]:

    """
    Function to estimate gradients and hessians of distributional parameters.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: xgb.DMatrix
        Data used for training.

    Returns
    -------
    grad: np.ndarray
        Gradient.
    hess: np.ndarray
        Hessian.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, self.n_dist_param))[:, :self.n_targets]

    # Weights
    if data.get_weight().size == 0:
        # Use 1 as weight if no weights are specified
        weights = torch.ones_like(target[:, 0], dtype=target.dtype).numpy().reshape(-1, 1)
    else:
        weights = data.get_weight().reshape(-1, 1)

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_base_margin().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate gradients and hessians
    predt, loss = self.get_params_loss(predt, target, start_values, requires_grad=True)
    grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

    return grad, hess
predict_dist(booster, start_values, data, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

booster : xgb.Booster Trained model. start_values : np.ndarray Starting values for each distributional parameter. data : xgb.DMatrix Data to predict from. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. - "expectiles" returns the predicted expectiles. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

pred : pd.DataFrame Predictions.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
def predict_dist(self,
                 booster: xgb.Booster,
                 start_values: np.ndarray,
                 data: xgb.DMatrix,
                 pred_type: str = "parameters",
                 n_samples: int = 1000,
                 quantiles: list = [0.1, 0.5, 0.9],
                 seed: str = 123
                 ) -> pd.DataFrame:
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    booster : xgb.Booster
        Trained model.
    start_values : np.ndarray
        Starting values for each distributional parameter.
    data : xgb.DMatrix
        Data to predict from.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
        - "expectiles" returns the predicted expectiles.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    pred : pd.DataFrame
        Predictions.
    """
    # Set base_margin as starting point for each distributional parameter. Requires base_score=0 in parameters.
    base_margin_pred = (np.ones(shape=(data.num_row(), 1))) * start_values
    data.set_base_margin(base_margin_pred.flatten())

    # Predict from model
    n_obs = data.num_row()
    predt = np.array(booster.predict(data, output_margin=True)).reshape(-1, self.n_dist_param)
    predt = [torch.tensor(predt[:, i].reshape(-1, 1), dtype=torch.float32) for i in range(self.n_dist_param)]
    dist_params_predt = self.param_transform(predt, self.param_dict, self.n_targets, rank=self.rank, n_obs=n_obs)

    # Predicted Distributional Parameters
    if self.distribution.__name__ == "Dirichlet":
        dist_kwargs = dict(zip(self.distribution_arg_names, [dist_params_predt]))
    else:
        dist_kwargs = dict(zip(self.distribution_arg_names, dist_params_predt))
    dist_pred = self.distribution(**dist_kwargs)

    # Draw samples from predicted response distribution
    pred_samples_df = self.draw_samples(dist_pred=dist_pred, n_samples=n_samples, seed=seed)

    # Get predicted distributional parameters
    predt_params_df = self.get_dist_params(n_targets=self.n_targets, dist_pred=dist_pred)

    if pred_type == "parameters":
        return predt_params_df

    elif pred_type == "samples":
        return pred_samples_df

    elif pred_type == "quantiles":
        # Calculate quantiles from predicted response distribution
        targets = pred_samples_df["target"]
        pred_quant_df = pred_samples_df.drop(columns="target")
        pred_quant_df = pred_quant_df.quantile(quantiles, axis=1).T
        pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
        if self.discrete:
            pred_quant_df = pred_quant_df.astype(int)
        pred_quant_df = pd.concat([targets, pred_quant_df], axis=1)

        return pred_quant_df
stabilize_derivative(input_der, type='MAD')

Function that stabilizes Gradients and Hessians.

As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges, the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve convergence might be to standardize the response variable. This is especially useful if the range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the standardization of the response are not always advised but need to be carefully considered. Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

Parameters

input_der : torch.Tensor Input derivative, either Gradient or Hessian. type: str Stabilization method. Can be either "None", "MAD" or "L2".

Returns

stab_der : torch.Tensor Stabilized Gradient or Hessian.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
    """
    Function that stabilizes Gradients and Hessians.

    As XGBoostLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
    that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
    the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
    Another way to improve convergence might be to standardize the response variable. This is especially useful if the
    range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
    the standardization of the response are not always advised but need to be carefully considered.
    Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

    Parameters
    ----------
    input_der : torch.Tensor
        Input derivative, either Gradient or Hessian.
    type: str
        Stabilization method. Can be either "None", "MAD" or "L2".

    Returns
    -------
    stab_der : torch.Tensor
        Stabilized Gradient or Hessian.
    """

    if type == "MAD":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nansum(input_der)))
        div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        stab_der = input_der / div

    if type == "L2":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nansum(input_der)))
        div = torch.sqrt(torch.nansum(input_der.pow(2)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
        stab_der = input_der / div

    if type == "None":
        stab_der = torch.nan_to_num(input_der, nan=float(torch.nansum(input_der)))

    return stab_der
target_append(target, n_targets, n_dist_param)

Function that appends target to the number of specified parameters.

Arguments

target: np.ndarray Target variables. n_targets: int Number of targets. n_dist_param: int Number of distribution parameters.

Returns

label: np.ndarray Array with appended targets.

Source code in xgboostlss/distributions/multivariate_distribution_utils.py
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
def target_append(self,
                  target: np.ndarray,
                  n_targets: int,
                  n_dist_param: int
                  ) -> np.ndarray:
    """
    Function that appends target to the number of specified parameters.

    Arguments
    ---------
    target: np.ndarray
        Target variables.
    n_targets: int
        Number of targets.
    n_dist_param: int
        Number of distribution parameters.

    Returns
    -------
    label: np.ndarray
        Array with appended targets.
    """
    label = target.reshape(-1, n_targets)
    n_obs = label.shape[0]
    n_fill = n_dist_param - n_targets
    np_fill = np.ones((n_obs, n_fill))
    label_append = np.concatenate([label, np_fill], axis=1).reshape(-1, n_dist_param)

    return label_append

zero_inflated

ZeroAdjustedBeta

Bases: ZeroInflatedDistribution

A Zero-Adjusted Beta distribution.

Parameter

concentration1: torch.Tensor 1st concentration parameter of the distribution (often referred to as alpha). concentration0: torch.Tensor 2nd concentration parameter of the distribution (often referred to as beta). gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

Source code in xgboostlss/distributions/zero_inflated.py
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
class ZeroAdjustedBeta(ZeroInflatedDistribution):
    """
    A Zero-Adjusted Beta distribution.

    Parameter
    ---------
    concentration1: torch.Tensor
        1st concentration parameter of the distribution (often referred to as alpha).
    concentration0: torch.Tensor
        2nd concentration parameter of the distribution (often referred to as beta).
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py
    """
    arg_constraints = {
        "concentration1": constraints.positive,
        "concentration0": constraints.positive,
        "gate": constraints.unit_interval,
    }
    support = constraints.unit_interval

    def __init__(self, concentration1, concentration0, gate=None, validate_args=None):
        base_dist = Beta(concentration1=concentration1, concentration0=concentration0, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def concentration1(self):
        return self.base_dist.concentration1

    @property
    def concentration0(self):
        return self.base_dist.concentration0

ZeroAdjustedGamma

Bases: ZeroInflatedDistribution

A Zero-Adjusted Gamma distribution.

Parameter

concentration: torch.Tensor shape parameter of the distribution (often referred to as alpha) rate: torch.Tensor rate = 1 / scale of the distribution (often referred to as beta) gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

Source code in xgboostlss/distributions/zero_inflated.py
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
class ZeroAdjustedGamma(ZeroInflatedDistribution):
    """
    A Zero-Adjusted Gamma distribution.

    Parameter
    ---------
    concentration: torch.Tensor
        shape parameter of the distribution (often referred to as alpha)
    rate: torch.Tensor
        rate = 1 / scale of the distribution (often referred to as beta)
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py
    """
    arg_constraints = {
        "concentration": constraints.positive,
        "rate": constraints.positive,
        "gate": constraints.unit_interval,
    }
    support = constraints.nonnegative

    def __init__(self, concentration, rate, gate=None, validate_args=None):
        base_dist = Gamma(concentration=concentration, rate=rate, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def concentration(self):
        return self.base_dist.concentration

    @property
    def rate(self):
        return self.base_dist.rate

ZeroAdjustedLogNormal

Bases: ZeroInflatedDistribution

A Zero-Adjusted Log-Normal distribution.

Parameter

loc: torch.Tensor Mean of log of distribution. scale: torch.Tensor Standard deviation of log of the distribution. gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

Source code in xgboostlss/distributions/zero_inflated.py
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
class ZeroAdjustedLogNormal(ZeroInflatedDistribution):
    """
    A Zero-Adjusted Log-Normal distribution.

    Parameter
    ---------
    loc: torch.Tensor
        Mean of log of distribution.
    scale: torch.Tensor
        Standard deviation of log of the distribution.
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py
    """
    arg_constraints = {
        "loc": constraints.real,
        "scale": constraints.positive,
        "gate": constraints.unit_interval,
    }
    support = constraints.nonnegative

    def __init__(self, loc, scale, gate=None, validate_args=None):
        base_dist = LogNormal(loc=loc, scale=scale, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def loc(self):
        return self.base_dist.loc

    @property
    def scale(self):
        return self.base_dist.scale

ZeroInflatedDistribution

Bases: TorchDistribution

Generic Zero Inflated distribution.

This can be used directly or can be used as a base class as e.g. for :class:ZeroInflatedPoisson and :class:ZeroInflatedNegativeBinomial.

Parameters

gate : torch.Tensor Probability of extra zeros given via a Bernoulli distribution. base_dist : torch.distributions.Distribution The base distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L18

Source code in xgboostlss/distributions/zero_inflated.py
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
class ZeroInflatedDistribution(TorchDistribution):
    """
    Generic Zero Inflated distribution.

    This can be used directly or can be used as a base class as e.g. for
    :class:`ZeroInflatedPoisson` and :class:`ZeroInflatedNegativeBinomial`.

    Parameters
    ----------
    gate : torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.
    base_dist : torch.distributions.Distribution
        The base distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L18
    """

    arg_constraints = {
        "gate": constraints.unit_interval,
        "gate_logits": constraints.real,
    }

    def __init__(self, base_dist, *, gate=None, gate_logits=None, validate_args=None):
        if (gate is None) == (gate_logits is None):
            raise ValueError(
                "Either `gate` or `gate_logits` must be specified, but not both."
            )
        if gate is not None:
            batch_shape = broadcast_shape(gate.shape, base_dist.batch_shape)
            self.gate = gate.expand(batch_shape)
        else:
            batch_shape = broadcast_shape(gate_logits.shape, base_dist.batch_shape)
            self.gate_logits = gate_logits.expand(batch_shape)
        if base_dist.event_shape:
            raise ValueError(
                "ZeroInflatedDistribution expected empty "
                "base_dist.event_shape but got {}".format(base_dist.event_shape)
            )

        self.base_dist = base_dist.expand(batch_shape)
        event_shape = torch.Size()

        super().__init__(batch_shape, event_shape, validate_args)

    @constraints.dependent_property
    def support(self):
        return self.base_dist.support

    @lazy_property
    def gate(self):
        return logits_to_probs(self.gate_logits)

    @lazy_property
    def gate_logits(self):
        return probs_to_logits(self.gate)

    def log_prob(self, value):
        if self._validate_args:
            self._validate_sample(value)

        zero_idx = (value == 0)
        support = self.support
        epsilon = abs(torch.finfo(value.dtype).eps)

        if hasattr(support, "lower_bound"):
            if is_identically_zero(getattr(support, "lower_bound", None)):
                value = value.clamp_min(epsilon)

        if hasattr(support, "upper_bound"):
            if is_identically_one(getattr(support, "upper_bound", None)) & (value.max() == 1.0):
                value = value.clamp_max(1 - epsilon)

        if "gate" in self.__dict__:
            gate, value = broadcast_all(self.gate, value)
            log_prob = (-gate).log1p() + self.base_dist.log_prob(value)
            log_prob = torch.where(zero_idx, (gate + log_prob.exp()).log(), log_prob)
        else:
            gate_logits, value = broadcast_all(self.gate_logits, value)
            log_prob_minus_log_gate = -gate_logits + self.base_dist.log_prob(value)
            log_gate = -softplus(-gate_logits)
            log_prob = log_prob_minus_log_gate + log_gate
            zero_log_prob = softplus(log_prob_minus_log_gate) + log_gate
            log_prob = torch.where(zero_idx, zero_log_prob, log_prob)
        return log_prob

    def sample(self, sample_shape=torch.Size()):
        shape = self._extended_shape(sample_shape)
        with torch.no_grad():
            mask = torch.bernoulli(self.gate.expand(shape)).bool()
            samples = self.base_dist.expand(shape).sample()
            samples = torch.where(mask, samples.new_zeros(()), samples)
        return samples

    @lazy_property
    def mean(self):
        return (1 - self.gate) * self.base_dist.mean

    @lazy_property
    def variance(self):
        return (1 - self.gate) * (
                self.base_dist.mean**2 + self.base_dist.variance
        ) - self.mean**2

    def expand(self, batch_shape, _instance=None):
        new = self._get_checked_instance(type(self), _instance)
        batch_shape = torch.Size(batch_shape)
        gate = self.gate.expand(batch_shape) if "gate" in self.__dict__ else None
        gate_logits = (
            self.gate_logits.expand(batch_shape)
            if "gate_logits" in self.__dict__
            else None
        )
        base_dist = self.base_dist.expand(batch_shape)
        ZeroInflatedDistribution.__init__(
            new, base_dist, gate=gate, gate_logits=gate_logits, validate_args=False
        )
        new._validate_args = self._validate_args
        return new

ZeroInflatedNegativeBinomial

Bases: ZeroInflatedDistribution

A Zero Inflated Negative Binomial distribution.

Parameter

total_count: torch.Tensor Non-negative number of negative Bernoulli trial. probs: torch.Tensor Event probabilities of success in the half open interval [0, 1). logits: torch.Tensor Event log-odds of success (log(p/(1-p))). gate: torch.Tensor Probability of extra zeros given via a Bernoulli distribution.

Source
  • https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150
Source code in xgboostlss/distributions/zero_inflated.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
class ZeroInflatedNegativeBinomial(ZeroInflatedDistribution):
    """
    A Zero Inflated Negative Binomial distribution.

    Parameter
    ---------
    total_count: torch.Tensor
        Non-negative number of negative Bernoulli trial.
    probs: torch.Tensor
        Event probabilities of success in the half open interval [0, 1).
    logits: torch.Tensor
        Event log-odds of success (log(p/(1-p))).
    gate: torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.

    Source
    ------
    - https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150
    """

    arg_constraints = {
        "total_count": constraints.greater_than_eq(0),
        "probs": constraints.half_open_interval(0.0, 1.0),
        "logits": constraints.real,
        "gate": constraints.unit_interval,
    }
    support = constraints.nonnegative_integer

    def __init__(self, total_count, probs=None, gate=None, validate_args=None):
        base_dist = NegativeBinomial(total_count=total_count, probs=probs, logits=None, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def total_count(self):
        return self.base_dist.total_count

    @property
    def probs(self):
        return self.base_dist.probs

    @property
    def logits(self):
        return self.base_dist.logits

ZeroInflatedPoisson

Bases: ZeroInflatedDistribution

A Zero-Inflated Poisson distribution.

Parameter

rate: torch.Tensor The rate of the Poisson distribution. gate: torch.Tensor Probability of extra zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121

Source code in xgboostlss/distributions/zero_inflated.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
class ZeroInflatedPoisson(ZeroInflatedDistribution):
    """
    A Zero-Inflated Poisson distribution.

    Parameter
    ---------
    rate: torch.Tensor
        The rate of the Poisson distribution.
    gate: torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121
    """
    arg_constraints = {
        "rate": constraints.positive,
        "gate": constraints.unit_interval,
    }
    support = constraints.nonnegative_integer

    def __init__(self, rate, gate=None, validate_args=None):
        base_dist = Poisson(rate=rate, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def rate(self):
        return self.base_dist.rate

model

XGBoostLSS

XGBoostLSS model class

Parameters

dist : Distribution DistributionClass object. start_values : np.ndarray Starting values for each distributional parameter.

Source code in xgboostlss/model.py
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
class XGBoostLSS:
    """
    XGBoostLSS model class

    Parameters
    ----------
    dist : Distribution
        DistributionClass object.
    start_values : np.ndarray
        Starting values for each distributional parameter.
    """
    def __init__(self, dist):
        self.dist = dist             # Distribution object
        self.start_values = None     # Starting values for distributional parameters
        self.multivariate_label_expand = False
        self.multivariate_eval_label_expand = False

    def set_params_adj(self, params: Dict[str, Any]) -> Dict[str, Any]:
        """
        Set parameters for distributional model.

        Arguments
        ---------
        params : Dict[str, Any]
            Parameters for model.

        Returns
        -------
        params : Dict[str, Any]
            Updated Parameters for model.
        """
        params_adj = {
            "objective": None,
            "base_score": 0,
            "num_target": self.dist.n_dist_param,
            "disable_default_eval_metric": True
        }
        params.update(params_adj)

        return params

    def adjust_labels(self, dmatrix: DMatrix) -> None:
        """
        Adjust labels for multivariate distributions.

        Arguments
        ---------
        dmatrix : DMatrix
            DMatrix object.

        Returns
        -------
        None
        """
        if not (self.dist.univariate or self.multivariate_label_expand):
            self.multivariate_label_expand = True
            label = self.dist.target_append(
                dmatrix.get_label(),
                self.dist.n_targets,
                self.dist.n_dist_param
            )
            dmatrix.set_label(label)

    def set_base_margin(self, dmatrix: DMatrix) -> None:
        """
        Set base margin for distributions.

        Arguments
        ---------
        dmatrix : DMatrix
            DMatrix object.

        Returns
        -------
        None
        """
        if self.start_values is None:
            _, self.start_values = self.dist.calculate_start_values(dmatrix.get_label())
        base_margin = np.ones(shape=(dmatrix.num_row(), 1)) * self.start_values
        dmatrix.set_base_margin(base_margin.flatten())

    def train(
            self,
            params: Dict[str, Any],
            dtrain: DMatrix,
            num_boost_round: int = 10,
            *,
            evals: Optional[Sequence[Tuple[DMatrix, str]]] = None,
            early_stopping_rounds: Optional[int] = None,
            evals_result: Optional[TrainingCallback.EvalsLog] = None,
            verbose_eval: Optional[Union[bool, int]] = True,
            xgb_model: Optional[Union[str, os.PathLike, Booster, bytearray]] = None,
            callbacks: Optional[Sequence[TrainingCallback]] = None,
    ) -> Booster:
            """
            Train a booster with given parameters.

            Arguments
            ---------
            params :
                Booster params.
            dtrain :
                Data to be trained.
            num_boost_round :
                Number of boosting iterations.
            evals :
                List of validation sets for which metrics will evaluated during training.
                Validation metrics will help us track the performance of the model.
            early_stopping_rounds :
                Activates early stopping. Validation metric needs to improve at least once in
                every **early_stopping_rounds** round(s) to continue training.
                Requires at least one item in **evals**.
                The method returns the model from the last iteration (not the best one).  Use
                custom callback or model slicing if the best model is desired.
                If there's more than one item in **evals**, the last entry will be used for early
                stopping.
                If there's more than one metric in the **eval_metric** parameter given in
                **params**, the last metric will be used for early stopping.
                If early stopping occurs, the model will have two additional fields:
                ``bst.best_score``, ``bst.best_iteration``.
            evals_result :
                This dictionary stores the evaluation results of all the items in watchlist.
                Example: with a watchlist containing
                ``[(dtest,'eval'), (dtrain,'train')]`` and
                a parameter containing ``('eval_metric': 'logloss')``,
                the **evals_result** returns
                .. code-block:: python
                    {'train': {'logloss': ['0.48253', '0.35953']},
                     'eval': {'logloss': ['0.480385', '0.357756']}}
            verbose_eval :
                Requires at least one item in **evals**.
                If **verbose_eval** is True then the evaluation metric on the validation set is
                printed at each boosting stage.
                If **verbose_eval** is an integer then the evaluation metric on the validation set
                is printed at every given **verbose_eval** boosting stage. The last boosting stage
                / the boosting stage found by using **early_stopping_rounds** is also printed.
                Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric
                is printed every 4 boosting stages, instead of every boosting stage.
            xgb_model :
                Xgb model to be loaded before training (allows training continuation).
            callbacks :
                List of callback functions that are applied at end of each iteration.
                It is possible to use predefined callbacks by using
                :ref:`Callback API <callback_api>`.
                .. note::
                   States in callback are not preserved during training, which means callback
                   objects can not be reused for multiple training sessions without
                   reinitialization or deepcopy.
                .. code-block:: python
                    for params in parameters_grid:
                        # be sure to (re)initialize the callbacks before each run
                        callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]
                        xgboost.train(params, Xy, callbacks=callbacks)

            Returns
            -------
            Booster:
                The trained booster model.
            """
            self.set_params_adj(params)
            self.adjust_labels(dtrain)
            self.set_base_margin(dtrain)

            # Set base_margin for evals
            if evals is not None:
                evals = self.set_eval_margin(evals, self.start_values)

            self.booster = xgb.train(params,
                                     dtrain,
                                     num_boost_round=num_boost_round,
                                     evals=evals,
                                     obj=self.dist.objective_fn,
                                     custom_metric=self.dist.metric_fn,
                                     xgb_model=xgb_model,
                                     callbacks=callbacks,
                                     verbose_eval=verbose_eval,
                                     evals_result=evals_result,
                                     maximize=False,
                                     early_stopping_rounds=early_stopping_rounds)

    def cv(
        self,
        params: Dict[str, Any],
        dtrain: DMatrix,
        num_boost_round: int = 10,
        nfold: int = 3,
        stratified: bool = False,
        folds: XGBStratifiedKFold = None,
        early_stopping_rounds: Optional[int] = None,
        fpreproc: Optional[FPreProcCallable] = None,
        as_pandas: bool = True,
        verbose_eval: Optional[Union[int, bool]] = None,
        show_stdv: bool = True,
        seed: int = 0,
        callbacks: Optional[Sequence[TrainingCallback]] = None,
        shuffle: bool = True,
    ) -> Union[Dict[str, float], DataFrame]:
        # pylint: disable = invalid-name

        """
        Cross-validation with given parameters.

        Arguments
        ----------
        params : dict
            Booster params.
        dtrain : DMatrix
            Data to be trained.
        num_boost_round : int
            Number of boosting iterations.
        nfold : int
            Number of folds in CV.
        stratified : bool
            Perform stratified sampling.
        folds : a KFold or StratifiedKFold instance or list of fold indices
            Sklearn KFolds or StratifiedKFolds object.
            Alternatively may explicitly pass sample indices for each fold.
            For ``n`` folds, **folds** should be a length ``n`` list of tuples.
            Each tuple is ``(in,out)`` where ``in`` is a list of indices to be used
            as the training samples for the ``n`` th fold and ``out`` is a list of
            indices to be used as the testing samples for the ``n`` th fold.
        early_stopping_rounds: int
            Activates early stopping. Cross-Validation metric (average of validation
            metric computed over CV folds) needs to improve at least once in
            every **early_stopping_rounds** round(s) to continue training.
            The last entry in the evaluation history will represent the best iteration.
            If there's more than one metric in the **eval_metric** parameter given in
            **params**, the last metric will be used for early stopping.
        fpreproc : function
            Preprocessing function that takes (dtrain, dtest, param) and returns
            transformed versions of those.
        as_pandas : bool, default True
            Return pd.DataFrame when pandas is installed.
            If False or pandas is not installed, return np.ndarray
        verbose_eval : bool, int, or None, default None
            Whether to display the progress. If None, progress will be displayed
            when np.ndarray is returned. If True, progress will be displayed at
            boosting stage. If an integer is given, progress will be displayed
            at every given `verbose_eval` boosting stage.
        show_stdv : bool, default True
            Whether to display the standard deviation in progress.
            Results are not affected, and always contains std.
        seed : int
            Seed used to generate the folds (passed to numpy.random.seed).
        callbacks :
            List of callback functions that are applied at end of each iteration.
            It is possible to use predefined callbacks by using
            :ref:`Callback API <callback_api>`.
            .. note::
               States in callback are not preserved during training, which means callback
               objects can not be reused for multiple training sessions without
               reinitialization or deepcopy.
            .. code-block:: python
                for params in parameters_grid:
                    # be sure to (re)initialize the callbacks before each run
                    callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]
                    xgboost.train(params, Xy, callbacks=callbacks)
        shuffle : bool
            Shuffle data before creating folds.

        Returns
        -------
        evaluation history : list(string)
        """
        self.set_params_adj(params)
        self.adjust_labels(dtrain)
        self.set_base_margin(dtrain)

        self.cv_booster = xgb.cv(params,
                                 dtrain,
                                 num_boost_round=num_boost_round,
                                 nfold=nfold,
                                 stratified=stratified,
                                 folds=folds,
                                 obj=self.dist.objective_fn,
                                 custom_metric=self.dist.metric_fn,
                                 maximize=False,
                                 early_stopping_rounds=early_stopping_rounds,
                                 fpreproc=fpreproc,
                                 as_pandas=as_pandas,
                                 verbose_eval=verbose_eval,
                                 show_stdv=show_stdv,
                                 seed=seed,
                                 callbacks=callbacks,
                                 shuffle=shuffle)

        return self.cv_booster

    def hyper_opt(
        self,
        hp_dict: Dict,
        dtrain: DMatrix,
        num_boost_round=500,
        nfold=10,
        early_stopping_rounds=20,
        max_minutes=10,
        n_trials=None,
        study_name=None,
        silence=False,
        seed=None,
        hp_seed=None
    ):
        """
        Function to tune hyperparameters using optuna.

        Arguments
        ----------
        hp_dict: dict
            Dictionary of hyperparameters to tune.
        dtrain: xgb.DMatrix
            Training data.
        num_boost_round: int
            Number of boosting iterations.
        nfold: int
            Number of folds in CV.
        early_stopping_rounds: int
            Activates early stopping. Cross-Validation metric (average of validation
            metric computed over CV folds) needs to improve at least once in
            every **early_stopping_rounds** round(s) to continue training.
            The last entry in the evaluation history will represent the best iteration.
            If there's more than one metric in the **eval_metric** parameter given in
            **params**, the last metric will be used for early stopping.
        max_minutes: int
            Time budget in minutes, i.e., stop study after the given number of minutes.
        n_trials: int
            The number of trials. If this argument is set to None, there is no limitation on the number of trials.
        study_name: str
            Name of the hyperparameter study.
        silence: bool
            Controls the verbosity of the trail, i.e., user can silence the outputs of the trail.
        seed: int
            Seed used to generate the folds (passed to numpy.random.seed).
        hp_seed: int
            Seed for random number generator used in the Bayesian hyper-parameter search.

        Returns
        -------
        opt_params : dict
            Optimal hyper-parameters.
        """

        def objective(trial):

            hyper_params = {}

            for param_name, param_value in hp_dict.items():

                param_type = param_value[0]

                if param_type == "categorical" or param_type == "none":
                    hyper_params.update({param_name: trial.suggest_categorical(param_name, param_value[1])})

                elif param_type == "float":
                    param_constraints = param_value[1]
                    param_low = param_constraints["low"]
                    param_high = param_constraints["high"]
                    param_log = param_constraints["log"]
                    hyper_params.update(
                        {param_name: trial.suggest_float(param_name,
                                                         low=param_low,
                                                         high=param_high,
                                                         log=param_log
                                                         )
                         })

                elif param_type == "int":
                    param_constraints = param_value[1]
                    param_low = param_constraints["low"]
                    param_high = param_constraints["high"]
                    param_log = param_constraints["log"]
                    hyper_params.update(
                        {param_name: trial.suggest_int(param_name,
                                                       low=param_low,
                                                       high=param_high,
                                                       log=param_log
                                                       )
                         })

            # Add booster if not included in dictionary
            if "booster" not in hyper_params.keys():
                hyper_params.update({"booster": trial.suggest_categorical("booster", ["gbtree"])})

            # Add pruning
            pruning_callback = optuna.integration.XGBoostPruningCallback(trial, f"test-{self.dist.loss_fn}")

            xgblss_param_tuning = self.cv(params=hyper_params,
                                          dtrain=dtrain,
                                          num_boost_round=num_boost_round,
                                          nfold=nfold,
                                          early_stopping_rounds=early_stopping_rounds,
                                          callbacks=[pruning_callback],
                                          seed=seed,
                                          verbose_eval=False
                                          )

            # Add the optimal number of rounds
            opt_rounds = xgblss_param_tuning[f"test-{self.dist.loss_fn}-mean"].idxmin() + 1
            trial.set_user_attr("opt_round", int(opt_rounds))

            # Extract the best score
            best_score = np.min(xgblss_param_tuning[f"test-{self.dist.loss_fn}-mean"])
            # Replace -inf with 1e8 (to avoid -inf in the log)
            best_score = np.where(best_score == float('-inf'), float(1e8), best_score)

            return best_score

        if study_name is None:
            study_name = "XGBoostLSS Hyper-Parameter Optimization"

        if silence:
            optuna.logging.set_verbosity(optuna.logging.WARNING)

        if hp_seed is not None:
            sampler = TPESampler(seed=hp_seed)
        else:
            sampler = TPESampler()

        pruner = optuna.pruners.MedianPruner(n_startup_trials=10, n_warmup_steps=20)
        study = optuna.create_study(sampler=sampler, pruner=pruner, direction="minimize", study_name=study_name)
        study.optimize(objective, n_trials=n_trials, timeout=60 * max_minutes, show_progress_bar=True)

        print("\nHyper-Parameter Optimization successfully finished.")
        print("  Number of finished trials: ", len(study.trials))
        print("  Best trial:")
        opt_param = study.best_trial

        # Add optimal stopping round
        opt_param.params["opt_rounds"] = study.trials_dataframe()["user_attrs_opt_round"][
            study.trials_dataframe()["value"].idxmin()]
        opt_param.params["opt_rounds"] = int(opt_param.params["opt_rounds"])

        print("    Value: {}".format(opt_param.value))
        print("    Params: ")
        for key, value in opt_param.params.items():
            print("    {}: {}".format(key, value))

        return opt_param.params

    def predict(self,
                data: xgb.DMatrix,
                pred_type: str = "parameters",
                n_samples: int = 1000,
                quantiles: list = [0.1, 0.5, 0.9],
                seed: str = 123):
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        data : xgb.DMatrix
            Data to predict from.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
            - "expectiles" returns the predicted expectiles.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        predt_df : pd.DataFrame
            Predictions.
        """

        # Predict
        predt_df = self.dist.predict_dist(booster=self.booster,
                                          start_values=self.start_values,
                                          data=data,
                                          pred_type=pred_type,
                                          n_samples=n_samples,
                                          quantiles=quantiles,
                                          seed=seed)

        return predt_df

    def plot(self,
             X: pd.DataFrame,
             feature: str = "x",
             parameter: str = "loc",
             max_display: int = 15,
             plot_type: str = "Partial_Dependence"):
        """
        XGBoostLSS SHap plotting function.

        Arguments:
        ---------
        X: pd.DataFrame
            Train/Test Data
        feature: str
            Specifies which feature is to be plotted.
        parameter: str
            Specifies which distributional parameter is to be plotted.
        max_display: int
            Specifies the maximum number of features to be displayed.
        plot_type: str
            Specifies the type of plot:
                "Partial_Dependence" plots the partial dependence of the parameter on the feature.
                "Feature_Importance" plots the feature importance of the parameter.
        """
        shap.initjs()
        explainer = shap.TreeExplainer(self.booster)
        shap_values = explainer(X)

        param_pos = self.dist.distribution_arg_names.index(parameter)

        if plot_type == "Partial_Dependence":
            if self.dist.n_dist_param == 1:
                shap.plots.scatter(shap_values[:, feature], color=shap_values[:, feature])
            else:
                shap.plots.scatter(shap_values[:, feature][:, param_pos], color=shap_values[:, feature][:, param_pos])
        elif plot_type == "Feature_Importance":
            if self.dist.n_dist_param == 1:
                shap.plots.bar(shap_values, max_display=max_display if X.shape[1] > max_display else X.shape[1])
            else:
                shap.plots.bar(
                    shap_values[:, :, param_pos], max_display=max_display if X.shape[1] > max_display else X.shape[1]
                )

    def expectile_plot(self,
                       X: pd.DataFrame,
                       feature: str = "x",
                       expectile: str = "0.05",
                       plot_type: str = "Partial_Dependence"):
        """
        XGBoostLSS function for plotting expectile SHapley values.

        X: pd.DataFrame
            Train/Test Data
        feature: str
            Specifies which feature to use for plotting Partial_Dependence plot.
        expectile: str
            Specifies which expectile to plot.
        plot_type: str
            Specifies which SHapley-plot to visualize. Currently, "Partial_Dependence" and "Feature_Importance"
            are supported.
        """

        shap.initjs()
        explainer = shap.TreeExplainer(self.booster)
        shap_values = explainer(X)

        expect_pos = list(self.dist.param_dict.keys()).index(expectile)

        if plot_type == "Partial_Dependence":
            shap.plots.scatter(shap_values[:, feature][:, expect_pos], color=shap_values[:, feature][:, expect_pos])
        elif plot_type == "Feature_Importance":
            shap.plots.bar(shap_values[:, :, expect_pos], max_display=15 if X.shape[1] > 15 else X.shape[1])

    def set_eval_margin(self,
                        eval_set: list,
                        start_values: np.ndarray
                        ) -> list:

        """
        Function that sets the base margin for the evaluation set.

        Arguments
        ---------
        eval_set : list
            List of tuples containing the train and evaluation set.
        start_values : np.ndarray
            Array containing the start values for each distributional parameter.

        Returns
        -------
        eval_set : list
            List of tuples containing the train and evaluation set.
        """
        sets = [(item, label) for item, label in eval_set]

        eval_set1, label1 = sets[0]
        eval_set2, label2 = sets[1]

        # Adjust labels to number of distributional parameters
        if not (self.dist.univariate or self.multivariate_eval_label_expand):
            self.multivariate_eval_label_expand = True
            eval_set2_label = self.dist.target_append(eval_set2.get_label(), self.dist.n_targets, self.dist.n_dist_param)
            eval_set2.set_label(eval_set2_label)

        # Set base margins
        base_margin_set1 = (np.ones(shape=(eval_set1.num_row(), 1))) * start_values
        eval_set1.set_base_margin(base_margin_set1.flatten())
        base_margin_set2 = (np.ones(shape=(eval_set2.num_row(), 1))) * start_values
        eval_set2.set_base_margin(base_margin_set2.flatten())

        eval_set = [(eval_set1, label1), (eval_set2, label2)]

        return eval_set

    def save_model(self,
                   model_path: str
                   ) -> None:
        """
        Save the model to a file.

        Parameters
        ----------
        model_path : str
            The path to save the model.

        Returns
        -------
        None
        """
        with open(model_path, "wb") as f:
            pickle.dump(self, f)

    @staticmethod
    def load_model(model_path):
        """
        Load the model from a file.

        Parameters
        ----------
        model_path : str
            The path to the saved model.

        Returns
        -------
        The loaded model.
        """
        with open(model_path, "rb") as f:
            return pickle.load(f)

adjust_labels(dmatrix)

Adjust labels for multivariate distributions.

Arguments

dmatrix : DMatrix DMatrix object.

Returns

None

Source code in xgboostlss/model.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
def adjust_labels(self, dmatrix: DMatrix) -> None:
    """
    Adjust labels for multivariate distributions.

    Arguments
    ---------
    dmatrix : DMatrix
        DMatrix object.

    Returns
    -------
    None
    """
    if not (self.dist.univariate or self.multivariate_label_expand):
        self.multivariate_label_expand = True
        label = self.dist.target_append(
            dmatrix.get_label(),
            self.dist.n_targets,
            self.dist.n_dist_param
        )
        dmatrix.set_label(label)

cv(params, dtrain, num_boost_round=10, nfold=3, stratified=False, folds=None, early_stopping_rounds=None, fpreproc=None, as_pandas=True, verbose_eval=None, show_stdv=True, seed=0, callbacks=None, shuffle=True)

Cross-validation with given parameters.

Arguments

params : dict Booster params. dtrain : DMatrix Data to be trained. num_boost_round : int Number of boosting iterations. nfold : int Number of folds in CV. stratified : bool Perform stratified sampling. folds : a KFold or StratifiedKFold instance or list of fold indices Sklearn KFolds or StratifiedKFolds object. Alternatively may explicitly pass sample indices for each fold. For n folds, folds should be a length n list of tuples. Each tuple is (in,out) where in is a list of indices to be used as the training samples for the n th fold and out is a list of indices to be used as the testing samples for the n th fold. early_stopping_rounds: int Activates early stopping. Cross-Validation metric (average of validation metric computed over CV folds) needs to improve at least once in every early_stopping_rounds round(s) to continue training. The last entry in the evaluation history will represent the best iteration. If there's more than one metric in the eval_metric parameter given in params, the last metric will be used for early stopping. fpreproc : function Preprocessing function that takes (dtrain, dtest, param) and returns transformed versions of those. as_pandas : bool, default True Return pd.DataFrame when pandas is installed. If False or pandas is not installed, return np.ndarray verbose_eval : bool, int, or None, default None Whether to display the progress. If None, progress will be displayed when np.ndarray is returned. If True, progress will be displayed at boosting stage. If an integer is given, progress will be displayed at every given verbose_eval boosting stage. show_stdv : bool, default True Whether to display the standard deviation in progress. Results are not affected, and always contains std. seed : int Seed used to generate the folds (passed to numpy.random.seed). callbacks : List of callback functions that are applied at end of each iteration. It is possible to use predefined callbacks by using :ref:Callback API <callback_api>. .. note:: States in callback are not preserved during training, which means callback objects can not be reused for multiple training sessions without reinitialization or deepcopy. .. code-block:: python for params in parameters_grid: # be sure to (re)initialize the callbacks before each run callbacks = [xgb.callback.LearningRateScheduler(custom_rates)] xgboost.train(params, Xy, callbacks=callbacks) shuffle : bool Shuffle data before creating folds.

Returns

evaluation history : list(string)

Source code in xgboostlss/model.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
def cv(
    self,
    params: Dict[str, Any],
    dtrain: DMatrix,
    num_boost_round: int = 10,
    nfold: int = 3,
    stratified: bool = False,
    folds: XGBStratifiedKFold = None,
    early_stopping_rounds: Optional[int] = None,
    fpreproc: Optional[FPreProcCallable] = None,
    as_pandas: bool = True,
    verbose_eval: Optional[Union[int, bool]] = None,
    show_stdv: bool = True,
    seed: int = 0,
    callbacks: Optional[Sequence[TrainingCallback]] = None,
    shuffle: bool = True,
) -> Union[Dict[str, float], DataFrame]:
    # pylint: disable = invalid-name

    """
    Cross-validation with given parameters.

    Arguments
    ----------
    params : dict
        Booster params.
    dtrain : DMatrix
        Data to be trained.
    num_boost_round : int
        Number of boosting iterations.
    nfold : int
        Number of folds in CV.
    stratified : bool
        Perform stratified sampling.
    folds : a KFold or StratifiedKFold instance or list of fold indices
        Sklearn KFolds or StratifiedKFolds object.
        Alternatively may explicitly pass sample indices for each fold.
        For ``n`` folds, **folds** should be a length ``n`` list of tuples.
        Each tuple is ``(in,out)`` where ``in`` is a list of indices to be used
        as the training samples for the ``n`` th fold and ``out`` is a list of
        indices to be used as the testing samples for the ``n`` th fold.
    early_stopping_rounds: int
        Activates early stopping. Cross-Validation metric (average of validation
        metric computed over CV folds) needs to improve at least once in
        every **early_stopping_rounds** round(s) to continue training.
        The last entry in the evaluation history will represent the best iteration.
        If there's more than one metric in the **eval_metric** parameter given in
        **params**, the last metric will be used for early stopping.
    fpreproc : function
        Preprocessing function that takes (dtrain, dtest, param) and returns
        transformed versions of those.
    as_pandas : bool, default True
        Return pd.DataFrame when pandas is installed.
        If False or pandas is not installed, return np.ndarray
    verbose_eval : bool, int, or None, default None
        Whether to display the progress. If None, progress will be displayed
        when np.ndarray is returned. If True, progress will be displayed at
        boosting stage. If an integer is given, progress will be displayed
        at every given `verbose_eval` boosting stage.
    show_stdv : bool, default True
        Whether to display the standard deviation in progress.
        Results are not affected, and always contains std.
    seed : int
        Seed used to generate the folds (passed to numpy.random.seed).
    callbacks :
        List of callback functions that are applied at end of each iteration.
        It is possible to use predefined callbacks by using
        :ref:`Callback API <callback_api>`.
        .. note::
           States in callback are not preserved during training, which means callback
           objects can not be reused for multiple training sessions without
           reinitialization or deepcopy.
        .. code-block:: python
            for params in parameters_grid:
                # be sure to (re)initialize the callbacks before each run
                callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]
                xgboost.train(params, Xy, callbacks=callbacks)
    shuffle : bool
        Shuffle data before creating folds.

    Returns
    -------
    evaluation history : list(string)
    """
    self.set_params_adj(params)
    self.adjust_labels(dtrain)
    self.set_base_margin(dtrain)

    self.cv_booster = xgb.cv(params,
                             dtrain,
                             num_boost_round=num_boost_round,
                             nfold=nfold,
                             stratified=stratified,
                             folds=folds,
                             obj=self.dist.objective_fn,
                             custom_metric=self.dist.metric_fn,
                             maximize=False,
                             early_stopping_rounds=early_stopping_rounds,
                             fpreproc=fpreproc,
                             as_pandas=as_pandas,
                             verbose_eval=verbose_eval,
                             show_stdv=show_stdv,
                             seed=seed,
                             callbacks=callbacks,
                             shuffle=shuffle)

    return self.cv_booster

expectile_plot(X, feature='x', expectile='0.05', plot_type='Partial_Dependence')

XGBoostLSS function for plotting expectile SHapley values.

pd.DataFrame

Train/Test Data

feature: str Specifies which feature to use for plotting Partial_Dependence plot. expectile: str Specifies which expectile to plot. plot_type: str Specifies which SHapley-plot to visualize. Currently, "Partial_Dependence" and "Feature_Importance" are supported.

Source code in xgboostlss/model.py
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
def expectile_plot(self,
                   X: pd.DataFrame,
                   feature: str = "x",
                   expectile: str = "0.05",
                   plot_type: str = "Partial_Dependence"):
    """
    XGBoostLSS function for plotting expectile SHapley values.

    X: pd.DataFrame
        Train/Test Data
    feature: str
        Specifies which feature to use for plotting Partial_Dependence plot.
    expectile: str
        Specifies which expectile to plot.
    plot_type: str
        Specifies which SHapley-plot to visualize. Currently, "Partial_Dependence" and "Feature_Importance"
        are supported.
    """

    shap.initjs()
    explainer = shap.TreeExplainer(self.booster)
    shap_values = explainer(X)

    expect_pos = list(self.dist.param_dict.keys()).index(expectile)

    if plot_type == "Partial_Dependence":
        shap.plots.scatter(shap_values[:, feature][:, expect_pos], color=shap_values[:, feature][:, expect_pos])
    elif plot_type == "Feature_Importance":
        shap.plots.bar(shap_values[:, :, expect_pos], max_display=15 if X.shape[1] > 15 else X.shape[1])

hyper_opt(hp_dict, dtrain, num_boost_round=500, nfold=10, early_stopping_rounds=20, max_minutes=10, n_trials=None, study_name=None, silence=False, seed=None, hp_seed=None)

Function to tune hyperparameters using optuna.

Arguments

hp_dict: dict Dictionary of hyperparameters to tune. dtrain: xgb.DMatrix Training data. num_boost_round: int Number of boosting iterations. nfold: int Number of folds in CV. early_stopping_rounds: int Activates early stopping. Cross-Validation metric (average of validation metric computed over CV folds) needs to improve at least once in every early_stopping_rounds round(s) to continue training. The last entry in the evaluation history will represent the best iteration. If there's more than one metric in the eval_metric parameter given in params, the last metric will be used for early stopping. max_minutes: int Time budget in minutes, i.e., stop study after the given number of minutes. n_trials: int The number of trials. If this argument is set to None, there is no limitation on the number of trials. study_name: str Name of the hyperparameter study. silence: bool Controls the verbosity of the trail, i.e., user can silence the outputs of the trail. seed: int Seed used to generate the folds (passed to numpy.random.seed). hp_seed: int Seed for random number generator used in the Bayesian hyper-parameter search.

Returns

opt_params : dict Optimal hyper-parameters.

Source code in xgboostlss/model.py
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
def hyper_opt(
    self,
    hp_dict: Dict,
    dtrain: DMatrix,
    num_boost_round=500,
    nfold=10,
    early_stopping_rounds=20,
    max_minutes=10,
    n_trials=None,
    study_name=None,
    silence=False,
    seed=None,
    hp_seed=None
):
    """
    Function to tune hyperparameters using optuna.

    Arguments
    ----------
    hp_dict: dict
        Dictionary of hyperparameters to tune.
    dtrain: xgb.DMatrix
        Training data.
    num_boost_round: int
        Number of boosting iterations.
    nfold: int
        Number of folds in CV.
    early_stopping_rounds: int
        Activates early stopping. Cross-Validation metric (average of validation
        metric computed over CV folds) needs to improve at least once in
        every **early_stopping_rounds** round(s) to continue training.
        The last entry in the evaluation history will represent the best iteration.
        If there's more than one metric in the **eval_metric** parameter given in
        **params**, the last metric will be used for early stopping.
    max_minutes: int
        Time budget in minutes, i.e., stop study after the given number of minutes.
    n_trials: int
        The number of trials. If this argument is set to None, there is no limitation on the number of trials.
    study_name: str
        Name of the hyperparameter study.
    silence: bool
        Controls the verbosity of the trail, i.e., user can silence the outputs of the trail.
    seed: int
        Seed used to generate the folds (passed to numpy.random.seed).
    hp_seed: int
        Seed for random number generator used in the Bayesian hyper-parameter search.

    Returns
    -------
    opt_params : dict
        Optimal hyper-parameters.
    """

    def objective(trial):

        hyper_params = {}

        for param_name, param_value in hp_dict.items():

            param_type = param_value[0]

            if param_type == "categorical" or param_type == "none":
                hyper_params.update({param_name: trial.suggest_categorical(param_name, param_value[1])})

            elif param_type == "float":
                param_constraints = param_value[1]
                param_low = param_constraints["low"]
                param_high = param_constraints["high"]
                param_log = param_constraints["log"]
                hyper_params.update(
                    {param_name: trial.suggest_float(param_name,
                                                     low=param_low,
                                                     high=param_high,
                                                     log=param_log
                                                     )
                     })

            elif param_type == "int":
                param_constraints = param_value[1]
                param_low = param_constraints["low"]
                param_high = param_constraints["high"]
                param_log = param_constraints["log"]
                hyper_params.update(
                    {param_name: trial.suggest_int(param_name,
                                                   low=param_low,
                                                   high=param_high,
                                                   log=param_log
                                                   )
                     })

        # Add booster if not included in dictionary
        if "booster" not in hyper_params.keys():
            hyper_params.update({"booster": trial.suggest_categorical("booster", ["gbtree"])})

        # Add pruning
        pruning_callback = optuna.integration.XGBoostPruningCallback(trial, f"test-{self.dist.loss_fn}")

        xgblss_param_tuning = self.cv(params=hyper_params,
                                      dtrain=dtrain,
                                      num_boost_round=num_boost_round,
                                      nfold=nfold,
                                      early_stopping_rounds=early_stopping_rounds,
                                      callbacks=[pruning_callback],
                                      seed=seed,
                                      verbose_eval=False
                                      )

        # Add the optimal number of rounds
        opt_rounds = xgblss_param_tuning[f"test-{self.dist.loss_fn}-mean"].idxmin() + 1
        trial.set_user_attr("opt_round", int(opt_rounds))

        # Extract the best score
        best_score = np.min(xgblss_param_tuning[f"test-{self.dist.loss_fn}-mean"])
        # Replace -inf with 1e8 (to avoid -inf in the log)
        best_score = np.where(best_score == float('-inf'), float(1e8), best_score)

        return best_score

    if study_name is None:
        study_name = "XGBoostLSS Hyper-Parameter Optimization"

    if silence:
        optuna.logging.set_verbosity(optuna.logging.WARNING)

    if hp_seed is not None:
        sampler = TPESampler(seed=hp_seed)
    else:
        sampler = TPESampler()

    pruner = optuna.pruners.MedianPruner(n_startup_trials=10, n_warmup_steps=20)
    study = optuna.create_study(sampler=sampler, pruner=pruner, direction="minimize", study_name=study_name)
    study.optimize(objective, n_trials=n_trials, timeout=60 * max_minutes, show_progress_bar=True)

    print("\nHyper-Parameter Optimization successfully finished.")
    print("  Number of finished trials: ", len(study.trials))
    print("  Best trial:")
    opt_param = study.best_trial

    # Add optimal stopping round
    opt_param.params["opt_rounds"] = study.trials_dataframe()["user_attrs_opt_round"][
        study.trials_dataframe()["value"].idxmin()]
    opt_param.params["opt_rounds"] = int(opt_param.params["opt_rounds"])

    print("    Value: {}".format(opt_param.value))
    print("    Params: ")
    for key, value in opt_param.params.items():
        print("    {}: {}".format(key, value))

    return opt_param.params

load_model(model_path) staticmethod

Load the model from a file.

Parameters

model_path : str The path to the saved model.

Returns

The loaded model.

Source code in xgboostlss/model.py
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
@staticmethod
def load_model(model_path):
    """
    Load the model from a file.

    Parameters
    ----------
    model_path : str
        The path to the saved model.

    Returns
    -------
    The loaded model.
    """
    with open(model_path, "rb") as f:
        return pickle.load(f)

plot(X, feature='x', parameter='loc', max_display=15, plot_type='Partial_Dependence')

XGBoostLSS SHap plotting function.

Arguments:

X: pd.DataFrame Train/Test Data feature: str Specifies which feature is to be plotted. parameter: str Specifies which distributional parameter is to be plotted. max_display: int Specifies the maximum number of features to be displayed. plot_type: str Specifies the type of plot: "Partial_Dependence" plots the partial dependence of the parameter on the feature. "Feature_Importance" plots the feature importance of the parameter.

Source code in xgboostlss/model.py
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
def plot(self,
         X: pd.DataFrame,
         feature: str = "x",
         parameter: str = "loc",
         max_display: int = 15,
         plot_type: str = "Partial_Dependence"):
    """
    XGBoostLSS SHap plotting function.

    Arguments:
    ---------
    X: pd.DataFrame
        Train/Test Data
    feature: str
        Specifies which feature is to be plotted.
    parameter: str
        Specifies which distributional parameter is to be plotted.
    max_display: int
        Specifies the maximum number of features to be displayed.
    plot_type: str
        Specifies the type of plot:
            "Partial_Dependence" plots the partial dependence of the parameter on the feature.
            "Feature_Importance" plots the feature importance of the parameter.
    """
    shap.initjs()
    explainer = shap.TreeExplainer(self.booster)
    shap_values = explainer(X)

    param_pos = self.dist.distribution_arg_names.index(parameter)

    if plot_type == "Partial_Dependence":
        if self.dist.n_dist_param == 1:
            shap.plots.scatter(shap_values[:, feature], color=shap_values[:, feature])
        else:
            shap.plots.scatter(shap_values[:, feature][:, param_pos], color=shap_values[:, feature][:, param_pos])
    elif plot_type == "Feature_Importance":
        if self.dist.n_dist_param == 1:
            shap.plots.bar(shap_values, max_display=max_display if X.shape[1] > max_display else X.shape[1])
        else:
            shap.plots.bar(
                shap_values[:, :, param_pos], max_display=max_display if X.shape[1] > max_display else X.shape[1]
            )

predict(data, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

data : xgb.DMatrix Data to predict from. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. - "expectiles" returns the predicted expectiles. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

predt_df : pd.DataFrame Predictions.

Source code in xgboostlss/model.py
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
def predict(self,
            data: xgb.DMatrix,
            pred_type: str = "parameters",
            n_samples: int = 1000,
            quantiles: list = [0.1, 0.5, 0.9],
            seed: str = 123):
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    data : xgb.DMatrix
        Data to predict from.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
        - "expectiles" returns the predicted expectiles.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    predt_df : pd.DataFrame
        Predictions.
    """

    # Predict
    predt_df = self.dist.predict_dist(booster=self.booster,
                                      start_values=self.start_values,
                                      data=data,
                                      pred_type=pred_type,
                                      n_samples=n_samples,
                                      quantiles=quantiles,
                                      seed=seed)

    return predt_df

save_model(model_path)

Save the model to a file.

Parameters

model_path : str The path to save the model.

Returns

None

Source code in xgboostlss/model.py
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
def save_model(self,
               model_path: str
               ) -> None:
    """
    Save the model to a file.

    Parameters
    ----------
    model_path : str
        The path to save the model.

    Returns
    -------
    None
    """
    with open(model_path, "wb") as f:
        pickle.dump(self, f)

set_base_margin(dmatrix)

Set base margin for distributions.

Arguments

dmatrix : DMatrix DMatrix object.

Returns

None

Source code in xgboostlss/model.py
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def set_base_margin(self, dmatrix: DMatrix) -> None:
    """
    Set base margin for distributions.

    Arguments
    ---------
    dmatrix : DMatrix
        DMatrix object.

    Returns
    -------
    None
    """
    if self.start_values is None:
        _, self.start_values = self.dist.calculate_start_values(dmatrix.get_label())
    base_margin = np.ones(shape=(dmatrix.num_row(), 1)) * self.start_values
    dmatrix.set_base_margin(base_margin.flatten())

set_eval_margin(eval_set, start_values)

Function that sets the base margin for the evaluation set.

Arguments

eval_set : list List of tuples containing the train and evaluation set. start_values : np.ndarray Array containing the start values for each distributional parameter.

Returns

eval_set : list List of tuples containing the train and evaluation set.

Source code in xgboostlss/model.py
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
def set_eval_margin(self,
                    eval_set: list,
                    start_values: np.ndarray
                    ) -> list:

    """
    Function that sets the base margin for the evaluation set.

    Arguments
    ---------
    eval_set : list
        List of tuples containing the train and evaluation set.
    start_values : np.ndarray
        Array containing the start values for each distributional parameter.

    Returns
    -------
    eval_set : list
        List of tuples containing the train and evaluation set.
    """
    sets = [(item, label) for item, label in eval_set]

    eval_set1, label1 = sets[0]
    eval_set2, label2 = sets[1]

    # Adjust labels to number of distributional parameters
    if not (self.dist.univariate or self.multivariate_eval_label_expand):
        self.multivariate_eval_label_expand = True
        eval_set2_label = self.dist.target_append(eval_set2.get_label(), self.dist.n_targets, self.dist.n_dist_param)
        eval_set2.set_label(eval_set2_label)

    # Set base margins
    base_margin_set1 = (np.ones(shape=(eval_set1.num_row(), 1))) * start_values
    eval_set1.set_base_margin(base_margin_set1.flatten())
    base_margin_set2 = (np.ones(shape=(eval_set2.num_row(), 1))) * start_values
    eval_set2.set_base_margin(base_margin_set2.flatten())

    eval_set = [(eval_set1, label1), (eval_set2, label2)]

    return eval_set

set_params_adj(params)

Set parameters for distributional model.

Arguments

params : Dict[str, Any] Parameters for model.

Returns

params : Dict[str, Any] Updated Parameters for model.

Source code in xgboostlss/model.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def set_params_adj(self, params: Dict[str, Any]) -> Dict[str, Any]:
    """
    Set parameters for distributional model.

    Arguments
    ---------
    params : Dict[str, Any]
        Parameters for model.

    Returns
    -------
    params : Dict[str, Any]
        Updated Parameters for model.
    """
    params_adj = {
        "objective": None,
        "base_score": 0,
        "num_target": self.dist.n_dist_param,
        "disable_default_eval_metric": True
    }
    params.update(params_adj)

    return params

train(params, dtrain, num_boost_round=10, *, evals=None, early_stopping_rounds=None, evals_result=None, verbose_eval=True, xgb_model=None, callbacks=None)

Train a booster with given parameters.

Arguments

params : Booster params. dtrain : Data to be trained. num_boost_round : Number of boosting iterations. evals : List of validation sets for which metrics will evaluated during training. Validation metrics will help us track the performance of the model. early_stopping_rounds : Activates early stopping. Validation metric needs to improve at least once in every early_stopping_rounds round(s) to continue training. Requires at least one item in evals. The method returns the model from the last iteration (not the best one). Use custom callback or model slicing if the best model is desired. If there's more than one item in evals, the last entry will be used for early stopping. If there's more than one metric in the eval_metric parameter given in params, the last metric will be used for early stopping. If early stopping occurs, the model will have two additional fields: bst.best_score, bst.best_iteration. evals_result : This dictionary stores the evaluation results of all the items in watchlist. Example: with a watchlist containing [(dtest,'eval'), (dtrain,'train')] and a parameter containing ('eval_metric': 'logloss'), the evals_result returns .. code-block:: python {'train': {'logloss': ['0.48253', '0.35953']}, 'eval': {'logloss': ['0.480385', '0.357756']}} verbose_eval : Requires at least one item in evals. If verbose_eval is True then the evaluation metric on the validation set is printed at each boosting stage. If verbose_eval is an integer then the evaluation metric on the validation set is printed at every given verbose_eval boosting stage. The last boosting stage / the boosting stage found by using early_stopping_rounds is also printed. Example: with verbose_eval=4 and at least one item in evals, an evaluation metric is printed every 4 boosting stages, instead of every boosting stage. xgb_model : Xgb model to be loaded before training (allows training continuation). callbacks : List of callback functions that are applied at end of each iteration. It is possible to use predefined callbacks by using :ref:Callback API <callback_api>. .. note:: States in callback are not preserved during training, which means callback objects can not be reused for multiple training sessions without reinitialization or deepcopy. .. code-block:: python for params in parameters_grid: # be sure to (re)initialize the callbacks before each run callbacks = [xgb.callback.LearningRateScheduler(custom_rates)] xgboost.train(params, Xy, callbacks=callbacks)

Returns

Booster: The trained booster model.

Source code in xgboostlss/model.py
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
def train(
        self,
        params: Dict[str, Any],
        dtrain: DMatrix,
        num_boost_round: int = 10,
        *,
        evals: Optional[Sequence[Tuple[DMatrix, str]]] = None,
        early_stopping_rounds: Optional[int] = None,
        evals_result: Optional[TrainingCallback.EvalsLog] = None,
        verbose_eval: Optional[Union[bool, int]] = True,
        xgb_model: Optional[Union[str, os.PathLike, Booster, bytearray]] = None,
        callbacks: Optional[Sequence[TrainingCallback]] = None,
) -> Booster:
        """
        Train a booster with given parameters.

        Arguments
        ---------
        params :
            Booster params.
        dtrain :
            Data to be trained.
        num_boost_round :
            Number of boosting iterations.
        evals :
            List of validation sets for which metrics will evaluated during training.
            Validation metrics will help us track the performance of the model.
        early_stopping_rounds :
            Activates early stopping. Validation metric needs to improve at least once in
            every **early_stopping_rounds** round(s) to continue training.
            Requires at least one item in **evals**.
            The method returns the model from the last iteration (not the best one).  Use
            custom callback or model slicing if the best model is desired.
            If there's more than one item in **evals**, the last entry will be used for early
            stopping.
            If there's more than one metric in the **eval_metric** parameter given in
            **params**, the last metric will be used for early stopping.
            If early stopping occurs, the model will have two additional fields:
            ``bst.best_score``, ``bst.best_iteration``.
        evals_result :
            This dictionary stores the evaluation results of all the items in watchlist.
            Example: with a watchlist containing
            ``[(dtest,'eval'), (dtrain,'train')]`` and
            a parameter containing ``('eval_metric': 'logloss')``,
            the **evals_result** returns
            .. code-block:: python
                {'train': {'logloss': ['0.48253', '0.35953']},
                 'eval': {'logloss': ['0.480385', '0.357756']}}
        verbose_eval :
            Requires at least one item in **evals**.
            If **verbose_eval** is True then the evaluation metric on the validation set is
            printed at each boosting stage.
            If **verbose_eval** is an integer then the evaluation metric on the validation set
            is printed at every given **verbose_eval** boosting stage. The last boosting stage
            / the boosting stage found by using **early_stopping_rounds** is also printed.
            Example: with ``verbose_eval=4`` and at least one item in **evals**, an evaluation metric
            is printed every 4 boosting stages, instead of every boosting stage.
        xgb_model :
            Xgb model to be loaded before training (allows training continuation).
        callbacks :
            List of callback functions that are applied at end of each iteration.
            It is possible to use predefined callbacks by using
            :ref:`Callback API <callback_api>`.
            .. note::
               States in callback are not preserved during training, which means callback
               objects can not be reused for multiple training sessions without
               reinitialization or deepcopy.
            .. code-block:: python
                for params in parameters_grid:
                    # be sure to (re)initialize the callbacks before each run
                    callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]
                    xgboost.train(params, Xy, callbacks=callbacks)

        Returns
        -------
        Booster:
            The trained booster model.
        """
        self.set_params_adj(params)
        self.adjust_labels(dtrain)
        self.set_base_margin(dtrain)

        # Set base_margin for evals
        if evals is not None:
            evals = self.set_eval_margin(evals, self.start_values)

        self.booster = xgb.train(params,
                                 dtrain,
                                 num_boost_round=num_boost_round,
                                 evals=evals,
                                 obj=self.dist.objective_fn,
                                 custom_metric=self.dist.metric_fn,
                                 xgb_model=xgb_model,
                                 callbacks=callbacks,
                                 verbose_eval=verbose_eval,
                                 evals_result=evals_result,
                                 maximize=False,
                                 early_stopping_rounds=early_stopping_rounds)

utils

exp_fn(predt)

Exponential function used to ensure predt is strictly positive.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
def exp_fn(predt: torch.tensor) -> torch.tensor:
    """
    Exponential function used to ensure predt is strictly positive.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.exp(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt

exp_fn_df(predt)

Exponential function used for Student-T distribution.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def exp_fn_df(predt: torch.tensor) -> torch.tensor:
    """
    Exponential function used for Student-T distribution.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.exp(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt + torch.tensor(2.0, dtype=predt.dtype)

gumbel_softmax_fn(predt, tau=1.0)

Gumbel-softmax function used to ensure predt is adding to one.

The Gumbel-softmax distribution is a continuous distribution over the simplex, which can be thought of as a "soft" version of a categorical distribution. It’s a way to draw samples from a categorical distribution in a differentiable way. The motivation behind using the Gumbel-Softmax is to make the discrete sampling process of categorical variables differentiable, which is useful in gradient-based optimization problems. To sample from a Gumbel-Softmax distribution, one would use the Gumbel-max trick: add a Gumbel noise to logits and apply the softmax. Formally, given a vector z, the Gumbel-softmax function s(z,tau)_i for a component i at temperature tau is defined as:

s(z,tau)_i = frac{e^{(z_i + g_i) / tau}}{sum_{j=1}^M e^{(z_j + g_j) / tau}}

where g_i is a sample from the Gumbel(0, 1) distribution. The parameter tau (temperature) controls the sharpness of the output distribution. As tau approaches 0, the mixing probabilities become more discrete, and as tau approaches infty, the mixing probabilities become more uniform. For more information we refer to

Jang, E., Gu, Shixiang and Poole, B. "Categorical Reparameterization with Gumbel-Softmax", ICLR, 2017.
Arguments

predt: torch.tensor Predicted values. tau: float, non-negative scalar temperature. Temperature parameter for the Gumbel-softmax distribution. As tau -> 0, the output becomes more discrete, and as tau -> inf, the output becomes more uniform.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
def gumbel_softmax_fn(predt: torch.tensor,
                      tau: float = 1.0
                      ) -> torch.tensor:
    """
    Gumbel-softmax function used to ensure predt is adding to one.

    The Gumbel-softmax distribution is a continuous distribution over the simplex, which can be thought of as a "soft"
    version of a categorical distribution. It’s a way to draw samples from a categorical distribution in a
    differentiable way. The motivation behind using the Gumbel-Softmax is to make the discrete sampling process of
    categorical variables differentiable, which is useful in gradient-based optimization problems. To sample from a
    Gumbel-Softmax distribution, one would use the Gumbel-max trick: add a Gumbel noise to logits and apply the softmax.
    Formally, given a vector z, the Gumbel-softmax function s(z,tau)_i for a component i at temperature tau is
    defined as:

        s(z,tau)_i = frac{e^{(z_i + g_i) / tau}}{sum_{j=1}^M e^{(z_j + g_j) / tau}}

    where g_i is a sample from the Gumbel(0, 1) distribution. The parameter tau (temperature) controls the sharpness
    of the output distribution. As tau approaches 0, the mixing probabilities become more discrete, and as tau
    approaches infty, the mixing probabilities become more uniform. For more information we refer to

        Jang, E., Gu, Shixiang and Poole, B. "Categorical Reparameterization with Gumbel-Softmax", ICLR, 2017.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.
    tau: float, non-negative scalar temperature.
        Temperature parameter for the Gumbel-softmax distribution. As tau -> 0, the output becomes more discrete, and as
        tau -> inf, the output becomes more uniform.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    torch.manual_seed(123)
    predt = gumbel_softmax(nan_to_num(predt), tau=tau, dim=1) + torch.tensor(0, dtype=predt.dtype)

    return predt

identity_fn(predt)

Identity mapping of predt.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def identity_fn(predt: torch.tensor) -> torch.tensor:
    """
    Identity mapping of predt.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = nan_to_num(predt) + torch.tensor(0, dtype=predt.dtype)

    return predt

nan_to_num(predt)

Replace nan, inf and -inf with the mean of predt.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def nan_to_num(predt: torch.tensor) -> torch.tensor:
    """
    Replace nan, inf and -inf with the mean of predt.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.nan_to_num(predt,
                             nan=float(torch.nanmean(predt)),
                             posinf=float(torch.nanmean(predt)),
                             neginf=float(torch.nanmean(predt))
                             )

    return predt

relu_fn(predt)

Function used to ensure predt are scaled to max(0, predt).

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def relu_fn(predt: torch.tensor) -> torch.tensor:
    """
    Function used to ensure predt are scaled to max(0, predt).

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.relu(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt

relu_fn_df(predt)

Function used to ensure predt are scaled to max(0, predt).

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def relu_fn_df(predt: torch.tensor) -> torch.tensor:
    """
    Function used to ensure predt are scaled to max(0, predt).

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.relu(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt + torch.tensor(2.0, dtype=predt.dtype)

sigmoid_fn(predt)

Function used to ensure predt are scaled to (0,1).

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
def sigmoid_fn(predt: torch.tensor) -> torch.tensor:
    """
    Function used to ensure predt are scaled to (0,1).

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.sigmoid(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)
    predt = torch.clamp(predt, 1e-03, 1-1e-03)

    return predt

softmax_fn(predt)

Softmax function used to ensure predt is adding to one.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
def softmax_fn(predt: torch.tensor) -> torch.tensor:
    """
    Softmax function used to ensure predt is adding to one.


    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = softmax(nan_to_num(predt), dim=1) + torch.tensor(0, dtype=predt.dtype)

    return predt

softplus_fn(predt)

Softplus function used to ensure predt is strictly positive.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def softplus_fn(predt: torch.tensor) -> torch.tensor:
    """
    Softplus function used to ensure predt is strictly positive.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = softplus(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt

softplus_fn_df(predt)

Softplus function used for Student-T distribution.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def softplus_fn_df(predt: torch.tensor) -> torch.tensor:
    """
    Softplus function used for Student-T distribution.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = softplus(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt + torch.tensor(2.0, dtype=predt.dtype)

squareplus_fn(predt)

Square-Plus function used to ensure predt is strictly positive.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
def squareplus_fn(predt: torch.tensor) -> torch.tensor:
    """
    Square-Plus function used to ensure predt is strictly positive.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    b = torch.tensor(4., dtype=predt.dtype)
    predt = 0.5 * (predt + torch.sqrt(predt ** 2 + b)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt

squareplus_fn_df(predt)

Square-Plus function used to ensure predt is strictly positive.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in xgboostlss/utils.py
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
def squareplus_fn_df(predt: torch.tensor) -> torch.tensor:
    """
    Square-Plus function used to ensure predt is strictly positive.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    b = torch.tensor(4., dtype=predt.dtype)
    predt = 0.5 * (predt + torch.sqrt(predt ** 2 + b)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt + torch.tensor(2.0, dtype=predt.dtype)