API references

LightGBMLSS - An extension of LightGBM to probabilistic forecasting

datasets

LightGBMLSS - An extension of LightGBM to probabilistic forecasting

data_loader

load_simulated_gaussian_data()

Returns train/test dataframe of a simulated example.

Contains the following columns

y int64: response x int64: x-feature X1:X10 int64: random noise features

Source code in lightgbmlss/datasets/data_loader.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
def load_simulated_gaussian_data():
    """
    Returns train/test dataframe of a simulated example.

    Contains the following columns:
        y              int64: response
        x              int64: x-feature
        X1:X10         int64: random noise features

    """
    train_path = pkg_resources.resource_stream(__name__, "gaussian_train_sim.csv")
    train_df = pd.read_csv(train_path)

    test_path = pkg_resources.resource_stream(__name__, "gaussian_test_sim.csv")
    test_df = pd.read_csv(test_path)

    return train_df, test_df

load_simulated_studentT_data()

Returns train/test dataframe of a simulated example.

Contains the following columns

y int64: response x int64: x-feature X1:X10 int64: random noise features

Source code in lightgbmlss/datasets/data_loader.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def load_simulated_studentT_data():
    """
    Returns train/test dataframe of a simulated example.

    Contains the following columns:
        y              int64: response
        x              int64: x-feature
        X1:X10         int64: random noise features

    """
    train_path = pkg_resources.resource_stream(__name__, "studentT_train_sim.csv")
    train_df = pd.read_csv(train_path)

    test_path = pkg_resources.resource_stream(__name__, "studentT_test_sim.csv")
    test_df = pd.read_csv(test_path)

    return train_df, test_df

distributions

LightGBMLSS - An extension of LightGBM to probabilistic forecasting

Beta

Beta

Bases: DistributionClass

Beta distribution class.

Distributional Parameters

concentration1: torch.Tensor 1st concentration parameter of the distribution (often referred to as alpha). concentration0: torch.Tensor 2nd concentration parameter of the distribution (often referred to as beta).

Source

https://pytorch.org/docs/stable/distributions.html#beta

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/Beta.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Beta(DistributionClass):
    """
    Beta distribution class.

    Distributional Parameters
    -------------------------
    concentration1: torch.Tensor
        1st concentration parameter of the distribution (often referred to as alpha).
    concentration0: torch.Tensor
        2nd concentration parameter of the distribution (often referred to as beta).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#beta

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Beta_Torch
        param_dict = {"concentration1": response_fn, "concentration0": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Cauchy

Cauchy

Bases: DistributionClass

Cauchy distribution class.

Distributional Parameters

loc: torch.Tensor Mode or median of the distribution. scale: torch.Tensor Half width at half maximum.

Source

https://pytorch.org/docs/stable/distributions.html#cauchy

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/Cauchy.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Cauchy(DistributionClass):
    """
    Cauchy distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mode or median of the distribution.
    scale: torch.Tensor
        Half width at half maximum.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#cauchy

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Cauchy_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Expectile

Expectile

Bases: DistributionClass

Expectile distribution class.

Distributional Parameters

expectile: List List of specified expectiles.

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". expectiles: List List of expectiles in increasing order. penalize_crossing: bool Whether to include a penalty term to discourage crossing of expectiles.

Source code in lightgbmlss/distributions/Expectile.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
class Expectile(DistributionClass):
    """
    Expectile distribution class.

    Distributional Parameters
    -------------------------
    expectile: List
        List of specified expectiles.

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    expectiles: List
        List of expectiles in increasing order.
    penalize_crossing: bool
        Whether to include a penalty term to discourage crossing of expectiles.
    """
    def __init__(self,
                 stabilization: str = "None",
                 expectiles: List = [0.1, 0.5, 0.9],
                 penalize_crossing: bool = False,
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if not isinstance(expectiles, list):
            raise ValueError("Expectiles must be a list.")
        if not all([0 < expectile < 1 for expectile in expectiles]):
            raise ValueError("Expectiles must be between 0 and 1.")
        if not isinstance(penalize_crossing, bool):
            raise ValueError("penalize_crossing must be a boolean. Please choose from True or False.")

        # Set the parameters specific to the distribution
        distribution = Expectile_Torch
        torch.distributions.Distribution.set_default_validate_args(False)
        expectiles.sort()
        param_dict = {}
        for expectile in expectiles:
            key = f"expectile_{expectile}"
            param_dict[key] = identity_fn

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn="nll",
                         tau=torch.tensor(expectiles),
                         penalize_crossing=penalize_crossing
                         )

Expectile_Torch

Bases: Distribution

PyTorch implementation of expectiles.

Arguments

expectiles : List[torch.Tensor] List of expectiles. penalize_crossing : bool Whether to include a penalty term to discourage crossing of expectiles.

Source code in lightgbmlss/distributions/Expectile.py
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
class Expectile_Torch(Distribution):
    """
    PyTorch implementation of expectiles.

    Arguments
    ---------
    expectiles : List[torch.Tensor]
        List of expectiles.
    penalize_crossing : bool
        Whether to include a penalty term to discourage crossing of expectiles.
    """
    def __init__(self,
                 expectiles: List[torch.Tensor],
                 penalize_crossing: bool = False,
                 ):
        super(Expectile_Torch).__init__()
        self.expectiles = expectiles
        self.penalize_crossing = penalize_crossing
        self.__class__.__name__ = "Expectile"

    def log_prob(self, value: torch.Tensor, tau: List[torch.Tensor]) -> torch.Tensor:
        """
        Returns the log of the probability density function evaluated at `value`.

        Arguments
        ---------
        value : torch.Tensor
            Response for which log probability is to be calculated.
        tau : List[torch.Tensor]
            List of asymmetry parameters.

        Returns
        -------
        torch.Tensor
            Log probability of `value`.
        """
        value = value.reshape(-1, 1)
        loss = torch.tensor(0.0, dtype=torch.float32)
        penalty = torch.tensor(0.0, dtype=torch.float32)

        # Calculate loss
        predt_expectiles = []
        for expectile, tau_value in zip(self.expectiles, tau):
            weight = torch.where(value - expectile >= 0, tau_value, 1 - tau_value)
            loss += torch.nansum(weight * (value - expectile) ** 2)
            predt_expectiles.append(expectile.reshape(-1, 1))

        # Penalty term to discourage crossing of expectiles
        if self.penalize_crossing:
            predt_expectiles = torch.cat(predt_expectiles, dim=1)
            penalty = torch.mean(
                (~torch.all(torch.diff(predt_expectiles, dim=1) > 0, dim=1)).float()
            )

        loss = (loss * (1 + penalty)) / len(self.expectiles)

        return -loss
log_prob(value, tau)

Returns the log of the probability density function evaluated at value.

Arguments

value : torch.Tensor Response for which log probability is to be calculated. tau : List[torch.Tensor] List of asymmetry parameters.

Returns

torch.Tensor Log probability of value.

Source code in lightgbmlss/distributions/Expectile.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def log_prob(self, value: torch.Tensor, tau: List[torch.Tensor]) -> torch.Tensor:
    """
    Returns the log of the probability density function evaluated at `value`.

    Arguments
    ---------
    value : torch.Tensor
        Response for which log probability is to be calculated.
    tau : List[torch.Tensor]
        List of asymmetry parameters.

    Returns
    -------
    torch.Tensor
        Log probability of `value`.
    """
    value = value.reshape(-1, 1)
    loss = torch.tensor(0.0, dtype=torch.float32)
    penalty = torch.tensor(0.0, dtype=torch.float32)

    # Calculate loss
    predt_expectiles = []
    for expectile, tau_value in zip(self.expectiles, tau):
        weight = torch.where(value - expectile >= 0, tau_value, 1 - tau_value)
        loss += torch.nansum(weight * (value - expectile) ** 2)
        predt_expectiles.append(expectile.reshape(-1, 1))

    # Penalty term to discourage crossing of expectiles
    if self.penalize_crossing:
        predt_expectiles = torch.cat(predt_expectiles, dim=1)
        penalty = torch.mean(
            (~torch.all(torch.diff(predt_expectiles, dim=1) > 0, dim=1)).float()
        )

    loss = (loss * (1 + penalty)) / len(self.expectiles)

    return -loss

expectile_norm(tau=0.5, m=0, sd=1)

Calculates expectiles from Normal distribution for given tau values. For more details and distributions see https://rdrr.io/cran/expectreg/man/enorm.html

Arguments


tau : np.ndarray Vector of expectiles from the respective distribution. m : np.ndarray Mean of the Normal distribution. sd : np.ndarray Standard deviation of the Normal distribution.

Returns


np.ndarray

Source code in lightgbmlss/distributions/Expectile.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def expectile_norm(tau: np.ndarray = 0.5,
                   m: np.ndarray = 0,
                   sd: np.ndarray = 1):
    """
    Calculates expectiles from Normal distribution for given tau values.
    For more details and distributions see https://rdrr.io/cran/expectreg/man/enorm.html

    Arguments
    _________
    tau : np.ndarray
        Vector of expectiles from the respective distribution.
    m : np.ndarray
        Mean of the Normal distribution.
    sd : np.ndarray
        Standard deviation of the Normal distribution.

    Returns
    _______
    np.ndarray
    """
    tau[tau > 1 or tau < 0] = np.nan
    zz = 0 * tau
    lower = np.array(-10, dtype="float")
    lower = np.repeat(lower[np.newaxis, ...], len(tau), axis=0)
    upper = np.array(10, dtype="float")
    upper = np.repeat(upper[np.newaxis, ...], len(tau), axis=0)
    diff = 1
    index = 0
    while (diff > 1e-10) and (index < 1000):
        root = expectile_pnorm(zz) - tau
        root[np.isnan(root)] = 0
        lower[root < 0] = zz[root < 0]
        upper[root > 0] = zz[root > 0]
        zz = (upper + lower) / 2
        diff = np.nanmax(np.abs(root))
        index = index + 1
    zz[np.isnan(tau)] = np.nan

    return zz * sd + m

expectile_pnorm(tau=0.5, m=0, sd=1)

Normal Expectile Distribution Function. For more details and distributions see https://rdrr.io/cran/expectreg/man/enorm.html

Arguments


tau : np.ndarray Vector of expectiles from the respective distribution. m : np.ndarray Mean of the Normal distribution. sd : np.ndarray Standard deviation of the Normal distribution.

Returns


tau : np.ndarray Expectiles from the Normal distribution.

Source code in lightgbmlss/distributions/Expectile.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def expectile_pnorm(tau: np.ndarray = 0.5,
                    m: np.ndarray = 0,
                    sd: np.ndarray = 1
                    ):
    """
    Normal Expectile Distribution Function.
    For more details and distributions see https://rdrr.io/cran/expectreg/man/enorm.html

    Arguments
    _________
    tau : np.ndarray
        Vector of expectiles from the respective distribution.
    m : np.ndarray
        Mean of the Normal distribution.
    sd : np.ndarray
        Standard deviation of the Normal distribution.

    Returns
    _______
    tau : np.ndarray
        Expectiles from the Normal distribution.
    """
    z = (tau - m) / sd
    p = norm.cdf(z, loc=m, scale=sd)
    d = norm.pdf(z, loc=m, scale=sd)
    u = -d - z * p
    tau = u / (2 * u + z)

    return tau

Gamma

Gamma

Bases: DistributionClass

Gamma distribution class.

Distributional Parameters

concentration: torch.Tensor shape parameter of the distribution (often referred to as alpha) rate: torch.Tensor rate = 1 / scale of the distribution (often referred to as beta)

Source

https://pytorch.org/docs/stable/distributions.html#gamma

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/Gamma.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Gamma(DistributionClass):
    """
    Gamma distribution class.

     Distributional Parameters
    --------------------------
    concentration: torch.Tensor
        shape parameter of the distribution (often referred to as alpha)
    rate: torch.Tensor
        rate = 1 / scale of the distribution (often referred to as beta)

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#gamma

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Gamma_Torch
        param_dict = {"concentration": response_fn, "rate": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Gaussian

Gaussian

Bases: DistributionClass

Gaussian distribution class.

Distributional Parameters

loc: torch.Tensor Mean of the distribution (often referred to as mu). scale: torch.Tensor Standard deviation of the distribution (often referred to as sigma).

Source

https://pytorch.org/docs/stable/distributions.html#normal

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/Gaussian.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Gaussian(DistributionClass):
    """
    Gaussian distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of the distribution (often referred to as mu).
    scale: torch.Tensor
        Standard deviation of the distribution (often referred to as sigma).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#normal

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Gaussian_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Gumbel

Gumbel

Bases: DistributionClass

Gumbel distribution class.

Distributional Parameters

loc: torch.Tensor Location parameter of the distribution. scale: torch.Tensor Scale parameter of the distribution.

Source

https://pytorch.org/docs/stable/distributions.html#gumbel

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/Gumbel.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Gumbel(DistributionClass):
    """
    Gumbel distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Location parameter of the distribution.
    scale: torch.Tensor
        Scale parameter of the distribution.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#gumbel

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Gumbel_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Laplace

Laplace

Bases: DistributionClass

Laplace distribution class.

Distributional Parameters

loc: torch.Tensor Mean of the distribution. scale: torch.Tensor Scale of the distribution.

Source

https://pytorch.org/docs/stable/distributions.html#laplace

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/Laplace.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Laplace(DistributionClass):
    """
    Laplace distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of the distribution.
    scale: torch.Tensor
        Scale of the distribution.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#laplace

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Laplace_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

LogNormal

LogNormal

Bases: DistributionClass

LogNormal distribution class.

Distributional Parameters

loc: torch.Tensor Mean of log of distribution. scale: torch.Tensor Standard deviation of log of the distribution.

Source

https://pytorch.org/docs/stable/distributions.html#lognormal

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/LogNormal.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class LogNormal(DistributionClass):
    """
    LogNormal distribution class.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of log of distribution.
    scale: torch.Tensor
        Standard deviation of log of the distribution.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#lognormal

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = LogNormal_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Mixture

Mixture

Bases: MixtureDistributionClass

Mixture-Density distribution class.

Implements a mixture-density distribution for univariate targets, where all components are from different parameterizations of the same distribution-type. A mixture-density distribution is a concept used to model a complex distribution that arises from combining multiple simpler distributions. The Mixture-Density distribution is parameterized by a categorical selecting distribution (over M components) and M-component distributions. For more information on the Mixture-Density distribution, see:

Bishop, C. M. (1994). Mixture density networks. Technical Report NCRG/4288, Aston University, Birmingham, UK.
Distributional Parameters

Inherits the distributional parameters from the component distributions.

Source

https://pytorch.org/docs/stable/distributions.html#mixturesamefamily

Parameters

component_distribution: torch.distributions.Distribution Distribution class for the components of the mixture distribution. Has to be one of the available univariate distributions of the package. M: int Number of components in the mixture distribution. hessian_mode: str Mode for computing the Hessian. Must be one of the following:

    - "individual": Each parameter is treated as a separate tensor. As a result, the Hessian corresponds to the
    second-order derivative with respect to that specific parameter only. The resulting Hessians capture the
    curvature of the loss w.r.t. each individual parameter. This is usually more runtime intensive, but can
    be more accurate.

    - "grouped": Each tensor contains all parameters for a specific parameter-type, e.g., for a Gaussian-Mixture
    with M=2, loc=[loc_1, loc_2], scale=[scale_1, scale_2], and mix_prob=[mix_prob_1, mix_prob_2]. When
    computing the Hessian, the derivatives for all parameters in the respective tensor are calculated jointly.
    The resulting Hessians capture the curvature of the loss w.r.t. the entire parameter-type. This is usually
    less runtime intensive, but can be less accurate.

tau: float, non-negative scalar temperature. The Gumbel-softmax distribution is a continuous distribution over the simplex, which can be thought of as a "soft" version of a categorical distribution. It’s a way to draw samples from a categorical distribution in a differentiable way. The motivation behind using the Gumbel-Softmax is to make the discrete sampling process of categorical variables differentiable, which is useful in gradient-based optimization problems. To sample from a Gumbel-Softmax distribution, one would use the Gumbel-max trick: add a Gumbel noise to logits and apply the softmax. Formally, given a vector z, the Gumbel-softmax function s(z,tau)_i for a component i at temperature tau is defined as:

    s(z,tau)_i = frac{e^{(z_i + g_i) / tau}}{sum_{j=1}^M e^{(z_j + g_j) / tau}}

where g_i is a sample from the Gumbel(0, 1) distribution. The parameter tau (temperature) controls the sharpness
of the output distribution. As tau approaches 0, the mixing probabilities become more discrete, and as tau
approaches infty, the mixing probabilities become more uniform. For more information we refer to

    Jang, E., Gu, Shixiang and Poole, B. "Categorical Reparameterization with Gumbel-Softmax", ICLR, 2017.
Source code in lightgbmlss/distributions/Mixture.py
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
class Mixture(MixtureDistributionClass):
    """
    Mixture-Density distribution class.

    Implements a mixture-density distribution for univariate targets, where all components are from different
    parameterizations of the same distribution-type. A mixture-density distribution is a concept used to model a
    complex distribution that arises from combining multiple simpler distributions. The Mixture-Density distribution
    is parameterized by a categorical selecting distribution (over M components) and M-component distributions. For more
    information on the Mixture-Density distribution, see:

        Bishop, C. M. (1994). Mixture density networks. Technical Report NCRG/4288, Aston University, Birmingham, UK.


    Distributional Parameters
    -------------------------
    Inherits the distributional parameters from the component distributions.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#mixturesamefamily

    Parameters
    -------------------------
    component_distribution: torch.distributions.Distribution
        Distribution class for the components of the mixture distribution. Has to be one of the available
        univariate distributions of the package.
    M: int
        Number of components in the mixture distribution.
    hessian_mode: str
        Mode for computing the Hessian. Must be one of the following:

            - "individual": Each parameter is treated as a separate tensor. As a result, the Hessian corresponds to the
            second-order derivative with respect to that specific parameter only. The resulting Hessians capture the
            curvature of the loss w.r.t. each individual parameter. This is usually more runtime intensive, but can
            be more accurate.

            - "grouped": Each tensor contains all parameters for a specific parameter-type, e.g., for a Gaussian-Mixture
            with M=2, loc=[loc_1, loc_2], scale=[scale_1, scale_2], and mix_prob=[mix_prob_1, mix_prob_2]. When
            computing the Hessian, the derivatives for all parameters in the respective tensor are calculated jointly.
            The resulting Hessians capture the curvature of the loss w.r.t. the entire parameter-type. This is usually
            less runtime intensive, but can be less accurate.
    tau: float, non-negative scalar temperature.
        The Gumbel-softmax distribution is a continuous distribution over the simplex, which can be thought of as a "soft"
        version of a categorical distribution. It’s a way to draw samples from a categorical distribution in a
        differentiable way. The motivation behind using the Gumbel-Softmax is to make the discrete sampling process of
        categorical variables differentiable, which is useful in gradient-based optimization problems. To sample from a
        Gumbel-Softmax distribution, one would use the Gumbel-max trick: add a Gumbel noise to logits and apply the softmax.
        Formally, given a vector z, the Gumbel-softmax function s(z,tau)_i for a component i at temperature tau is
        defined as:

            s(z,tau)_i = frac{e^{(z_i + g_i) / tau}}{sum_{j=1}^M e^{(z_j + g_j) / tau}}

        where g_i is a sample from the Gumbel(0, 1) distribution. The parameter tau (temperature) controls the sharpness
        of the output distribution. As tau approaches 0, the mixing probabilities become more discrete, and as tau
        approaches infty, the mixing probabilities become more uniform. For more information we refer to

            Jang, E., Gu, Shixiang and Poole, B. "Categorical Reparameterization with Gumbel-Softmax", ICLR, 2017.
    """
    def __init__(self,
                 component_distribution: torch.distributions.Distribution,
                 M: int = 2,
                 hessian_mode: str = "individual",
                 tau: float = 1.0
                 ):

        # Input Checks
        mixt_dist = get_component_distributions()
        if str(component_distribution.__class__).split(".")[-2] not in mixt_dist:
            raise ValueError(f"component_distribution must be one of the following: {mixt_dist}.")
        if not isinstance(M, int):
            raise ValueError("M must be an integer.")
        if M < 2:
            raise ValueError("M must be greater than 1.")
        if component_distribution.loss_fn != "nll":
            raise ValueError("Loss for component_distribution must be 'nll'.")
        if not isinstance(hessian_mode, str):
            raise ValueError("hessian_mode must be a string.")
        if hessian_mode not in ["individual", "grouped"]:
            raise ValueError("hessian_mode must be either 'individual' or 'grouped'.")
        if not isinstance(tau, float):
            raise ValueError("tau must be a float.")
        if tau <= 0:
            raise ValueError("tau must be greater than 0.")

        # Set the parameters specific to the distribution
        param_dict = component_distribution.param_dict
        preset_gumbel_fn = partial(gumbel_softmax_fn, tau=tau)
        param_dict.update({"mix_prob": preset_gumbel_fn})
        distribution_arg_names = [f"{key}_{i}" for key in param_dict for i in range(1, M + 1)]
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=component_distribution,
                         M=M,
                         temperature=tau,
                         hessian_mode=hessian_mode,
                         univariate=True,
                         discrete=component_distribution.discrete,
                         n_dist_param=len(distribution_arg_names),
                         stabilization=component_distribution.stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=distribution_arg_names,
                         loss_fn=component_distribution.loss_fn
                         )

NegativeBinomial

NegativeBinomial

Bases: DistributionClass

NegativeBinomial distribution class.

Distributional Parameters

total_count: torch.Tensor Non-negative number of negative Bernoulli trials to stop. probs: torch.Tensor Event probabilities of success in the half open interval [0, 1). logits: torch.Tensor Event log-odds for probabilities of success.

Source

https://pytorch.org/docs/stable/distributions.html#negativebinomial

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn_total_count: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). response_fn_probs: str Response function for transforming the distributional parameters to the correct support. Options are "sigmoid" (sigmoid). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in lightgbmlss/distributions/NegativeBinomial.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class NegativeBinomial(DistributionClass):
    """
    NegativeBinomial distribution class.

    Distributional Parameters
    -------------------------
    total_count: torch.Tensor
        Non-negative number of negative Bernoulli trials to stop.
    probs: torch.Tensor
        Event probabilities of success in the half open interval [0, 1).
    logits: torch.Tensor
        Event log-odds for probabilities of success.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#negativebinomial

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn_total_count: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    response_fn_probs: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "sigmoid" (sigmoid).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn_total_count: str = "relu",
                 response_fn_probs: str = "sigmoid",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        #  Specify Response Functions for total_count
        response_functions_total_count = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn_total_count in response_functions_total_count:
            response_fn_total_count = response_functions_total_count[response_fn_total_count]
        else:
            raise ValueError(
                "Invalid response function for total_count. Please choose from 'exp', 'softplus' or 'relu'.")

        #  Specify Response Functions for probs
        response_functions_probs = {"sigmoid": sigmoid_fn}
        if response_fn_probs in response_functions_probs:
            response_fn_probs = response_functions_probs[response_fn_probs]
        else:
            raise ValueError(
                "Invalid response function for probs. Please select 'sigmoid'.")

        # Set the parameters specific to the distribution
        distribution = NegativeBinomial_Torch
        param_dict = {"total_count": response_fn_total_count, "probs": response_fn_probs}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=True,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Poisson

Poisson

Bases: DistributionClass

Poisson distribution class.

Distributional Parameters

rate: torch.Tensor Rate parameter of the distribution (often referred to as lambda).

Source

https://pytorch.org/docs/stable/distributions.html#poisson

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in lightgbmlss/distributions/Poisson.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class Poisson(DistributionClass):
    """
    Poisson distribution class.

    Distributional Parameters
    -------------------------
    rate: torch.Tensor
        Rate parameter of the distribution (often referred to as lambda).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#poisson

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "relu",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function for total_count. Please choose from 'exp', 'softplus' or 'relu'.")

        # Set the parameters specific to the distribution
        distribution = Poisson_Torch
        param_dict = {"rate": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=True,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

SplineFlow

SplineFlow

Bases: NormalizingFlowClass

Spline Flow class.

The spline flow is a normalizing flow based on element-wise rational spline bijections of linear and quadratic order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments that are the ratio of two polynomials. Rational splines offer an excellent combination of functional flexibility whilst maintaining a numerically stable inverse.

For more details, see: - Durkan, C., Bekasov, A., Murray, I. and Papamakarios, G. Neural Spline Flows. NeurIPS 2019. - Dolatabadi, H. M., Erfani, S. and Leckie, C., Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.

Source

https://docs.pyro.ai/en/stable/distributions.html#pyro.distributions.transforms.Spline

Arguments

target_support: str The target support. Options are - "real": [-inf, inf] - "positive": [0, inf] - "positive_integer": [0, 1, 2, 3, ...] - "unit_interval": [0, 1] count_bins: int The number of segments comprising the spline. bound: float The quantity "K" determining the bounding box, [-K,K] x [-K,K] of the spline. By adjusting the "K" value, you can control the size of the bounding box and consequently control the range of inputs that the spline transform operates on. Larger values of "K" will result in a wider valid range for the spline transformation, while smaller values will restrict the valid range to a smaller region. Should be chosen based on the range of the data. order: str The order of the spline. Options are "linear" or "quadratic". stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD" or "L2". loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/SplineFlow.py
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
class SplineFlow(NormalizingFlowClass):
    """
    Spline Flow class.

    The spline flow is a normalizing flow based on element-wise rational spline bijections of linear and quadratic
    order (Durkan et al., 2019; Dolatabadi et al., 2020). Rational splines are functions that are comprised of segments
    that are the ratio of two polynomials. Rational splines offer an excellent combination of functional flexibility
    whilst maintaining a numerically stable inverse.

    For more details, see:
    - Durkan, C., Bekasov, A., Murray, I. and Papamakarios, G. Neural Spline Flows. NeurIPS 2019.
    - Dolatabadi, H. M., Erfani, S. and Leckie, C., Invertible Generative Modeling using Linear Rational Splines. AISTATS 2020.


    Source
    ---------
    https://docs.pyro.ai/en/stable/distributions.html#pyro.distributions.transforms.Spline


    Arguments
    ---------
    target_support: str
        The target support. Options are
            - "real": [-inf, inf]
            - "positive": [0, inf]
            - "positive_integer": [0, 1, 2, 3, ...]
            - "unit_interval": [0, 1]
    count_bins: int
        The number of segments comprising the spline.
    bound: float
        The quantity "K" determining the bounding box, [-K,K] x [-K,K] of the spline. By adjusting the
        "K" value, you can control the size of the bounding box and consequently control the range of inputs that
        the spline transform operates on. Larger values of "K" will result in a wider valid range for the spline
        transformation, while smaller values will restrict the valid range to a smaller region. Should be chosen
        based on the range of the data.
    order: str
        The order of the spline. Options are "linear" or "quadratic".
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD" or "L2".
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 target_support: str = "real",
                 count_bins: int = 8,
                 bound: float = 3.0,
                 order: str = "linear",
                 stabilization: str = "None",
                 loss_fn: str = "nll"
                 ):

        # Specify Target Transform
        if not isinstance(target_support, str):
            raise ValueError("target_support must be a string.")

        transforms = {
            "real": (identity_transform, False),
            "positive": (SoftplusTransform(), False),
            "positive_integer": (SoftplusTransform(), True),
            "unit_interval": (SigmoidTransform(), False)
        }

        if target_support in transforms:
            target_transform, discrete = transforms[target_support]
        else:
            raise ValueError(
                "Invalid target_support. Options are 'real', 'positive', 'positive_integer', or 'unit_interval'.")

        # Check if count_bins is valid
        if not isinstance(count_bins, int):
            raise ValueError("count_bins must be an integer.")
        if count_bins <= 0:
            raise ValueError("count_bins must be a positive integer > 0.")

        # Check if bound is float
        if not isinstance(bound, float):
            raise ValueError("bound must be a float.")

        # Number of parameters
        if not isinstance(order, str):
            raise ValueError("order must be a string.")

        order_params = {
            "quadratic": 2 * count_bins + (count_bins - 1),
            "linear": 3 * count_bins + (count_bins - 1)
        }

        if order in order_params:
            n_params = order_params[order]
        else:
            raise ValueError("Invalid order specification. Options are 'linear' or 'quadratic'.")

        # Check if stabilization method is valid.
        if not isinstance(stabilization, str):
            raise ValueError("stabilization must be a string.")
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Options are 'None', 'MAD' or 'L2'.")

        # Check if loss function is valid.
        if not isinstance(loss_fn, str):
            raise ValueError("loss_fn must be a string.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss_fn. Options are 'nll' or 'crps'.")

        # Specify parameter dictionary
        param_dict = {f"param_{i + 1}": identity_fn for i in range(n_params)}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Normalizing Flow Class
        super().__init__(base_dist=Normal,                     # Base distribution, currently only Normal is supported.
                         flow_transform=Spline,
                         count_bins=count_bins,
                         bound=bound,
                         order=order,
                         n_dist_param=n_params,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         target_transform=target_transform,
                         discrete=discrete,
                         univariate=True,
                         stabilization=stabilization,
                         loss_fn=loss_fn
                         )

StudentT

StudentT

Bases: DistributionClass

Student-T Distribution Class

Distributional Parameters

df: torch.Tensor Degrees of freedom. loc: torch.Tensor Mean of the distribution. scale: torch.Tensor Scale of the distribution.

Source

https://pytorch.org/docs/stable/distributions.html#studentt

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/StudentT.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
class StudentT(DistributionClass):
    """
    Student-T Distribution Class

    Distributional Parameters
    -------------------------
    df: torch.Tensor
        Degrees of freedom.
    loc: torch.Tensor
        Mean of the distribution.
    scale: torch.Tensor
        Scale of the distribution.

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#studentt

     Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {
            "exp": (exp_fn, exp_fn_df),
            "softplus": (softplus_fn, softplus_fn_df)
        }
        if response_fn in response_functions:
            response_fn, response_fn_df = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = StudentT_Torch
        param_dict = {"df": response_fn_df, "loc": identity_fn, "scale": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

Weibull

Weibull

Bases: DistributionClass

Weibull distribution class.

Distributional Parameters

scale: torch.Tensor Scale parameter of distribution (lambda). concentration: torch.Tensor Concentration parameter of distribution (k/shape).

Source

https://pytorch.org/docs/stable/distributions.html#weibull

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/Weibull.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
class Weibull(DistributionClass):
    """
    Weibull distribution class.

    Distributional Parameters
    -------------------------
    scale: torch.Tensor
        Scale parameter of distribution (lambda).
    concentration: torch.Tensor
        Concentration parameter of distribution (k/shape).

    Source
    -------------------------
    https://pytorch.org/docs/stable/distributions.html#weibull

     Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll", "crps"]:
            raise ValueError("Invalid loss function. Please choose from 'nll' or 'crps'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = Weibull_Torch
        param_dict = {"scale": response_fn, "concentration": response_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZABeta

ZABeta

Bases: DistributionClass

Zero-Adjusted Beta distribution class.

The zero-adjusted Beta distribution is similar to the Beta distribution but allows zeros as y values.

Distributional Parameters

concentration1: torch.Tensor 1st concentration parameter of the distribution (often referred to as alpha). concentration0: torch.Tensor 2nd concentration parameter of the distribution (often referred to as beta). gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in lightgbmlss/distributions/ZABeta.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class ZABeta(DistributionClass):
    """
    Zero-Adjusted Beta distribution class.

    The zero-adjusted Beta distribution is similar to the Beta distribution but allows zeros as y values.

    Distributional Parameters
    -------------------------
    concentration1: torch.Tensor
        1st concentration parameter of the distribution (often referred to as alpha).
    concentration0: torch.Tensor
        2nd concentration parameter of the distribution (often referred to as beta).
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = ZeroAdjustedBeta_Torch
        param_dict = {"concentration1": response_fn, "concentration0": response_fn, "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZAGamma

ZAGamma

Bases: DistributionClass

Zero-Adjusted Gamma distribution class.

The zero-adjusted Gamma distribution is similar to the Gamma distribution but allows zeros as y values.

Distributional Parameters

concentration: torch.Tensor shape parameter of the distribution (often referred to as alpha) rate: torch.Tensor rate = 1 / scale of the distribution (often referred to as beta) gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in lightgbmlss/distributions/ZAGamma.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class ZAGamma(DistributionClass):
    """
    Zero-Adjusted Gamma distribution class.

    The zero-adjusted Gamma distribution is similar to the Gamma distribution but allows zeros as y values.

     Distributional Parameters
    --------------------------
    concentration: torch.Tensor
        shape parameter of the distribution (often referred to as alpha)
    rate: torch.Tensor
        rate = 1 / scale of the distribution (often referred to as beta)
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = ZeroAdjustedGamma_Torch
        param_dict = {"concentration": response_fn, "rate": response_fn, "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZALN

ZALN

Bases: DistributionClass

Zero-Adjusted LogNormal distribution class.

The zero-adjusted Log-Normal distribution is similar to the Log-Normal distribution but allows zeros as y values.

Distributional Parameters

loc: torch.Tensor Mean of log of distribution. scale: torch.Tensor Standard deviation of log of the distribution. gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential) or "softplus" (softplus). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in lightgbmlss/distributions/ZALN.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
class ZALN(DistributionClass):
    """
    Zero-Adjusted LogNormal distribution class.

    The zero-adjusted Log-Normal distribution is similar to the Log-Normal distribution but allows zeros as y values.

    Distributional Parameters
    -------------------------
    loc: torch.Tensor
        Mean of log of distribution.
    scale: torch.Tensor
        Standard deviation of log of the distribution.
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential) or "softplus" (softplus).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "exp",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function. Please choose from 'exp' or 'softplus'.")

        # Set the parameters specific to the distribution
        distribution = ZeroAdjustedLogNormal_Torch
        param_dict = {"loc": identity_fn, "scale": response_fn,  "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=False,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZINB

ZINB

Bases: DistributionClass

Zero-Inflated Negative Binomial distribution class.

Distributional Parameters

total_count: torch.Tensor Non-negative number of negative Bernoulli trials to stop. probs: torch.Tensor Event probabilities of success in the half open interval [0, 1). gate: torch.Tensor Probability of extra zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn_total_count: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). response_fn_probs: str Response function for transforming the distributional parameters to the correct support. Options are "sigmoid" (sigmoid). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in lightgbmlss/distributions/ZINB.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class ZINB(DistributionClass):
    """
    Zero-Inflated Negative Binomial distribution class.

    Distributional Parameters
    -------------------------
    total_count: torch.Tensor
        Non-negative number of negative Bernoulli trials to stop.
    probs: torch.Tensor
        Event probabilities of success in the half open interval [0, 1).
    gate: torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn_total_count: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    response_fn_probs: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "sigmoid" (sigmoid).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn_total_count: str = "relu",
                 response_fn_probs: str = "sigmoid",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        #  Specify Response Functions for total_count
        response_functions_total_count = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn_total_count in response_functions_total_count:
            response_fn_total_count = response_functions_total_count[response_fn_total_count]
        else:
            raise ValueError(
                "Invalid response function for total_count. Please choose from 'exp', 'softplus' or 'relu'.")

        #  Specify Response Functions for probs
        response_functions_probs = {"sigmoid": sigmoid_fn}
        if response_fn_probs in response_functions_probs:
            response_fn_probs = response_functions_probs[response_fn_probs]
        else:
            raise ValueError(
                "Invalid response function for probs. Please select 'sigmoid'.")

        # Set the parameters specific to the distribution
        distribution = ZeroInflatedNegativeBinomial_Torch
        param_dict = {"total_count": response_fn_total_count, "probs": response_fn_probs, "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=True,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

ZIPoisson

ZIPoisson

Bases: DistributionClass

Zero-Inflated Poisson distribution class.

Distributional Parameters

rate: torch.Tensor Rate parameter of the distribution (often referred to as lambda). gate: torch.Tensor Probability of extra zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121

Parameters

stabilization: str Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2". response_fn: str Response function for transforming the distributional parameters to the correct support. Options are "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit). loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in lightgbmlss/distributions/ZIPoisson.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
class ZIPoisson(DistributionClass):
    """
    Zero-Inflated Poisson distribution class.

    Distributional Parameters
    -------------------------
    rate: torch.Tensor
        Rate parameter of the distribution (often referred to as lambda).
    gate: torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.

    Source
    -------------------------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121

    Parameters
    -------------------------
    stabilization: str
        Stabilization method for the Gradient and Hessian. Options are "None", "MAD", "L2".
    response_fn: str
        Response function for transforming the distributional parameters to the correct support. Options are
        "exp" (exponential), "softplus" (softplus) or "relu" (rectified linear unit).
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 stabilization: str = "None",
                 response_fn: str = "relu",
                 loss_fn: str = "nll"
                 ):

        # Input Checks
        if stabilization not in ["None", "MAD", "L2"]:
            raise ValueError("Invalid stabilization method. Please choose from 'None', 'MAD' or 'L2'.")
        if loss_fn not in ["nll"]:
            raise ValueError("Invalid loss function. Please select 'nll'.")

        # Specify Response Functions
        response_functions = {"exp": exp_fn, "softplus": softplus_fn, "relu": relu_fn}
        if response_fn in response_functions:
            response_fn = response_functions[response_fn]
        else:
            raise ValueError(
                "Invalid response function for total_count. Please choose from 'exp', 'softplus' or 'relu'.")

        # Set the parameters specific to the distribution
        distribution = ZeroInflatedPoisson_Torch
        param_dict = {"rate": response_fn, "gate": sigmoid_fn}
        torch.distributions.Distribution.set_default_validate_args(False)

        # Specify Distribution Class
        super().__init__(distribution=distribution,
                         univariate=True,
                         discrete=True,
                         n_dist_param=len(param_dict),
                         stabilization=stabilization,
                         param_dict=param_dict,
                         distribution_arg_names=list(param_dict.keys()),
                         loss_fn=loss_fn
                         )

distribution_utils

DistributionClass

Generic class that contains general functions for univariate distributions.

Arguments

distribution: torch.distributions.Distribution PyTorch Distribution class. univariate: bool Whether the distribution is univariate or multivariate. discrete: bool Whether the support of the distribution is discrete or continuous. n_dist_param: int Number of distributional parameters. stabilization: str Stabilization method. param_dict: Dict[str, Any] Dictionary that maps distributional parameters to their response scale. distribution_arg_names: List List of distributional parameter names. loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function. tau: List List of expectiles. Only used for Expectile distributon. penalize_crossing: bool Whether to include a penalty term to discourage crossing of expectiles. Only used for Expectile distribution.

Source code in lightgbmlss/distributions/distribution_utils.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
class DistributionClass:
    """
    Generic class that contains general functions for univariate distributions.

    Arguments
    ---------
    distribution: torch.distributions.Distribution
        PyTorch Distribution class.
    univariate: bool
        Whether the distribution is univariate or multivariate.
    discrete: bool
        Whether the support of the distribution is discrete or continuous.
    n_dist_param: int
        Number of distributional parameters.
    stabilization: str
        Stabilization method.
    param_dict: Dict[str, Any]
        Dictionary that maps distributional parameters to their response scale.
    distribution_arg_names: List
        List of distributional parameter names.
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    tau: List
        List of expectiles. Only used for Expectile distributon.
    penalize_crossing: bool
        Whether to include a penalty term to discourage crossing of expectiles. Only used for Expectile distribution.
    """
    def __init__(self,
                 distribution: torch.distributions.Distribution = None,
                 univariate: bool = True,
                 discrete: bool = False,
                 n_dist_param: int = None,
                 stabilization: str = "None",
                 param_dict: Dict[str, Any] = None,
                 distribution_arg_names: List = None,
                 loss_fn: str = "nll",
                 tau: Optional[List[torch.Tensor]] = None,
                 penalize_crossing: bool = False,
                 ):

        self.distribution = distribution
        self.univariate = univariate
        self.discrete = discrete
        self.n_dist_param = n_dist_param
        self.stabilization = stabilization
        self.param_dict = param_dict
        self.distribution_arg_names = distribution_arg_names
        self.loss_fn = loss_fn
        self.tau = tau
        self.penalize_crossing = penalize_crossing

    def objective_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[np.ndarray, np.ndarray]:

        """
        Function to estimate gradients and hessians of distributional parameters.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: lgb.DMatrix
            Data used for training.

        Returns
        -------
        grad: np.ndarray
            Gradient.
        hess: np.ndarray
            Hessian.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1))

        # Weights
        if data.weight is None:
            # Use 1 as weight if no weights are specified
            weights = torch.ones_like(target, dtype=target.dtype).numpy()
        else:
            weights = data.get_weight().reshape(-1, 1)

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate gradients and hessians
        predt, loss = self.get_params_loss(predt, target, start_values, requires_grad=True)
        grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

        return grad, hess

    def metric_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[str, float, bool]:
        """
        Function that evaluates the predictions using the negative log-likelihood.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: lgb.Dataset
            Data used for training.

        Returns
        -------
        name: str
            Name of the evaluation metric.
        nll: float
            Negative log-likelihood.
        is_higher_better: bool
            Whether a higher value of the metric is better or not.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1))
        n_obs = target.shape[0]

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate loss
        is_higher_better = False
        _, loss = self.get_params_loss(predt, target, start_values, requires_grad=False)

        return self.loss_fn, loss / n_obs, is_higher_better

    def loss_fn_start_values(self,
                             params: torch.Tensor,
                             target: torch.Tensor) -> torch.Tensor:
        """
        Function that calculates the loss for a given set of distributional parameters. Only used for calculating
        the loss for the start values.

        Parameter
        ---------
        params: torch.Tensor
            Distributional parameters.
        target: torch.Tensor
            Target values.

        Returns
        -------
        loss: torch.Tensor
            Loss value.
        """
        # Transform parameters to response scale
        params = [
            response_fn(params[i].reshape(-1, 1)) for i, response_fn in enumerate(self.param_dict.values())
        ]

        # Replace NaNs and infinity values with 0.5
        nan_inf_idx = torch.isnan(torch.stack(params)) | torch.isinf(torch.stack(params))
        params = torch.where(nan_inf_idx, torch.tensor(0.5), torch.stack(params))

        # Specify Distribution and Loss
        if self.tau is None:
            dist = self.distribution(*params)
            loss = -torch.nansum(dist.log_prob(target))
        else:
            dist = self.distribution(params, self.penalize_crossing)
            loss = -torch.nansum(dist.log_prob(target, self.tau))

        return loss

    def calculate_start_values(self,
                               target: np.ndarray,
                               max_iter: int = 50
                               ) -> Tuple[float, np.ndarray]:
        """
        Function that calculates the starting values for each distributional parameter.

        Arguments
        ---------
        target: np.ndarray
            Data from which starting values are calculated.
        max_iter: int
            Maximum number of iterations.

        Returns
        -------
        loss: float
            Loss value.
        start_values: np.ndarray
            Starting values for each distributional parameter.
        """
        # Convert target to torch.tensor
        target = torch.tensor(target).reshape(-1, 1)

        # Initialize parameters
        params = [torch.tensor(0.5, requires_grad=True) for _ in range(self.n_dist_param)]

        # Specify optimizer
        optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter / 4), 20]), line_search_fn="strong_wolfe")

        # Define learning rate scheduler
        lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

        # Define closure
        def closure():
            optimizer.zero_grad()
            loss = self.loss_fn_start_values(params, target)
            loss.backward()
            return loss

        # Optimize parameters
        loss_vals = []
        for epoch in range(max_iter):
            loss = optimizer.step(closure)
            lr_scheduler.step(loss)
            loss_vals.append(loss.item())

        # Get final loss
        loss = np.array(loss_vals[-1])

        # Get start values
        start_values = np.array([params[i].detach() for i in range(self.n_dist_param)])

        # Replace any remaining NaNs or infinity values with 0.5
        start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

        return loss, start_values

    def get_params_loss(self,
                        predt: np.ndarray,
                        target: torch.Tensor,
                        start_values: List[float],
                        requires_grad: bool = False,
                        ) -> Tuple[List[torch.Tensor], np.ndarray]:
        """
        Function that returns the predicted parameters and the loss.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        target: torch.Tensor
            Target values.
        start_values: List
            Starting values for each distributional parameter.
        requires_grad: bool
            Whether to add to the computational graph or not.

        Returns
        -------
        predt: torch.Tensor
            Predicted parameters.
        loss: torch.Tensor
            Loss value.
        """
        # Predicted Parameters
        predt = predt.reshape(-1, self.n_dist_param, order="F")

        # Replace NaNs and infinity values with unconditional start values
        nan_inf_mask = np.isnan(predt) | np.isinf(predt)
        predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

        # Convert to torch.tensor
        predt = [
            torch.tensor(predt[:, i].reshape(-1, 1), requires_grad=requires_grad) for i in range(self.n_dist_param)
        ]

        # Predicted Parameters transformed to response scale
        predt_transformed = [
            response_fn(predt[i].reshape(-1, 1)) for i, response_fn in enumerate(self.param_dict.values())
        ]

        # Specify Distribution and Loss
        if self.tau is None:
            dist_kwargs = dict(zip(self.distribution_arg_names, predt_transformed))
            dist_fit = self.distribution(**dist_kwargs)
            if self.loss_fn == "nll":
                loss = -torch.nansum(dist_fit.log_prob(target))
            elif self.loss_fn == "crps":
                torch.manual_seed(123)
                dist_samples = dist_fit.rsample((30,)).squeeze(-1)
                loss = torch.nansum(self.crps_score(target, dist_samples))
            else:
                raise ValueError("Invalid loss function. Please select 'nll' or 'crps'.")
        else:
            dist_fit = self.distribution(predt_transformed, self.penalize_crossing)
            loss = -torch.nansum(dist_fit.log_prob(target, self.tau))

        return predt, loss

    def draw_samples(self,
                     predt_params: pd.DataFrame,
                     n_samples: int = 1000,
                     seed: int = 123
                     ) -> pd.DataFrame:
        """
        Function that draws n_samples from a predicted distribution.

        Arguments
        ---------
        predt_params: pd.DataFrame
            pd.DataFrame with predicted distributional parameters.
        n_samples: int
            Number of sample to draw from predicted response distribution.
        seed: int
            Manual seed.

        Returns
        -------
        pred_dist: pd.DataFrame
            DataFrame with n_samples drawn from predicted response distribution.

        """
        torch.manual_seed(seed)

        if self.tau is None:
            pred_params = torch.tensor(predt_params.values)
            dist_kwargs = {arg_name: param for arg_name, param in zip(self.distribution_arg_names, pred_params.T)}
            dist_pred = self.distribution(**dist_kwargs)
            dist_samples = dist_pred.sample((n_samples,)).squeeze().detach().numpy().T
            dist_samples = pd.DataFrame(dist_samples)
            dist_samples.columns = [str("y_sample") + str(i) for i in range(dist_samples.shape[1])]
        else:
            dist_samples = None

        if self.discrete:
            dist_samples = dist_samples.astype(int)

        return dist_samples

    def predict_dist(self,
                     booster: lgb.Booster,
                     data: pd.DataFrame,
                     start_values: np.ndarray,
                     pred_type: str = "parameters",
                     n_samples: int = 1000,
                     quantiles: list = [0.1, 0.5, 0.9],
                     seed: str = 123
                     ) -> pd.DataFrame:
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        booster : lgb.Booster
            Trained model.
        data : pd.DataFrame
            Data to predict from.
        start_values : np.ndarray.
            Starting values for each distributional parameter.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
            - "expectiles" returns the predicted expectiles.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        pred : pd.DataFrame
            Predictions.
        """

        predt = torch.tensor(
            booster.predict(data, raw_score=True),
            dtype=torch.float32
        ).reshape(-1, self.n_dist_param)

        # Set init_score as starting point for each distributional parameter.
        init_score_pred = torch.tensor(
            np.ones(shape=(data.shape[0], 1))*start_values,
            dtype=torch.float32
        )

        # The predictions don't include the init_score specified in creating the train data.
        # Hence, it needs to be added manually with the corresponding transform for each distributional parameter.
        dist_params_predt = np.concatenate(
            [
                response_fun(
                    predt[:, i].reshape(-1, 1) + init_score_pred[:, i].reshape(-1, 1)).numpy()
                for i, (dist_param, response_fun) in enumerate(self.param_dict.items())
            ],
            axis=1,
        )
        dist_params_predt = pd.DataFrame(dist_params_predt)
        dist_params_predt.columns = self.param_dict.keys()

        # Draw samples from predicted response distribution
        pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                            n_samples=n_samples,
                                            seed=seed)

        if pred_type == "parameters":
            return dist_params_predt

        elif pred_type == "expectiles":
            return dist_params_predt

        elif pred_type == "samples":
            return pred_samples_df

        elif pred_type == "quantiles":
            # Calculate quantiles from predicted response distribution
            pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
            pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
            if self.discrete:
                pred_quant_df = pred_quant_df.astype(int)
            return pred_quant_df

    def compute_gradients_and_hessians(self,
                                       loss: torch.tensor,
                                       predt: torch.tensor,
                                       weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

        """
        Calculates gradients and hessians.

        Output gradients and hessians have shape (n_samples*n_outputs, 1).

        Arguments:
        ---------
        loss: torch.Tensor
            Loss.
        predt: torch.Tensor
            List of predicted parameters.
        weights: np.ndarray
            Weights.

        Returns:
        -------
        grad: torch.Tensor
            Gradients.
        hess: torch.Tensor
            Hessians.
        """
        if self.loss_fn == "nll":
            # Gradient and Hessian
            grad = autograd(loss, inputs=predt, create_graph=True)
            hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]
        elif self.loss_fn == "crps":
            # Gradient and Hessian
            grad = autograd(loss, inputs=predt, create_graph=True)
            hess = [torch.ones_like(grad[i]) for i in range(len(grad))]

            # # Approximation of Hessian
            # step_size = 1e-6
            # predt_upper = [
            #     response_fn(predt[i] + step_size).reshape(-1, 1) for i, response_fn in
            #     enumerate(self.param_dict.values())
            # ]
            # dist_kwargs_upper = dict(zip(self.distribution_arg_names, predt_upper))
            # dist_fit_upper = self.distribution(**dist_kwargs_upper)
            # dist_samples_upper = dist_fit_upper.rsample((30,)).squeeze(-1)
            # loss_upper = torch.nansum(self.crps_score(self.target, dist_samples_upper))
            #
            # predt_lower = [
            #     response_fn(predt[i] - step_size).reshape(-1, 1) for i, response_fn in
            #     enumerate(self.param_dict.values())
            # ]
            # dist_kwargs_lower = dict(zip(self.distribution_arg_names, predt_lower))
            # dist_fit_lower = self.distribution(**dist_kwargs_lower)
            # dist_samples_lower = dist_fit_lower.rsample((30,)).squeeze(-1)
            # loss_lower = torch.nansum(self.crps_score(self.target, dist_samples_lower))
            #
            # grad_upper = autograd(loss_upper, inputs=predt_upper)
            # grad_lower = autograd(loss_lower, inputs=predt_lower)
            # hess = [(grad_upper[i] - grad_lower[i]) / (2 * step_size) for i in range(len(grad))]

        # Stabilization of Derivatives
        if self.stabilization != "None":
            grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
            hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

        # Reshape
        grad = torch.cat(grad, axis=1).detach().numpy()
        hess = torch.cat(hess, axis=1).detach().numpy()

        # Weighting
        grad *= weights
        hess *= weights

        # Reshape
        grad = grad.ravel(order="F")
        hess = hess.ravel(order="F")

        return grad, hess

    def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
        """
        Function that stabilizes Gradients and Hessians.

        As LightGBMLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
        that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
        the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
        Another way to improve convergence might be to standardize the response variable. This is especially useful if the
        range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
        the standardization of the response are not always advised but need to be carefully considered.
        Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

        Parameters
        ----------
        input_der : torch.Tensor
            Input derivative, either Gradient or Hessian.
        type: str
            Stabilization method. Can be either "None", "MAD" or "L2".

        Returns
        -------
        stab_der : torch.Tensor
            Stabilized Gradient or Hessian.
        """

        if type == "MAD":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            stab_der = input_der / div

        if type == "L2":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.sqrt(torch.nanmean(input_der.pow(2)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
            stab_der = input_der / div

        if type == "None":
            stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

        return stab_der

    def crps_score(self, y: torch.tensor, yhat_dist: torch.tensor) -> torch.tensor:
        """
        Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

        Parameters
        ----------
        y: torch.Tensor
            Response variable of shape (n_observations,1).
        yhat_dist: torch.Tensor
            Predicted samples of shape (n_samples, n_observations).

        Returns
        -------
        crps: torch.Tensor
            CRPS score.

        References
        ----------
        Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.
        Journal of the American Statistical Association. 102. 359-378.

        Source
        ------
        https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549
        """
        # Get the number of observations
        n_samples = yhat_dist.shape[0]

        # Sort the forecasts in ascending order
        yhat_dist_sorted, _ = torch.sort(yhat_dist, 0)

        # Create temporary tensors
        y_cdf = torch.zeros_like(y)
        yhat_cdf = torch.zeros_like(y)
        yhat_prev = torch.zeros_like(y)
        crps = torch.zeros_like(y)

        # Loop over the predicted samples generated per observation
        for yhat in yhat_dist_sorted:
            yhat = yhat.reshape(-1, 1)
            flag = (y_cdf == 0) * (y < yhat)
            crps += flag * ((y - yhat_prev) * yhat_cdf ** 2)
            crps += flag * ((yhat - y) * (yhat_cdf - 1) ** 2)
            crps += (~flag) * ((yhat - yhat_prev) * (yhat_cdf - y_cdf) ** 2)
            y_cdf += flag
            yhat_cdf += 1 / n_samples
            yhat_prev = yhat

        # In case y_cdf == 0 after the loop
        flag = (y_cdf == 0)
        crps += flag * (y - yhat)

        return crps

    def dist_select(self,
                    target: np.ndarray,
                    candidate_distributions: List,
                    max_iter: int = 100,
                    plot: bool = False,
                    figure_size: tuple = (10, 5),
                    ) -> pd.DataFrame:
        """
        Function that selects the most suitable distribution among the candidate_distributions for the target variable,
        based on the NegLogLikelihood (lower is better).

        Parameters
        ----------
        target: np.ndarray
            Response variable.
        candidate_distributions: List
            List of candidate distributions.
        max_iter: int
            Maximum number of iterations for the optimization.
        plot: bool
            If True, a density plot of the actual and fitted distribution is created.
        figure_size: tuple
            Figure size of the density plot.

        Returns
        -------
        fit_df: pd.DataFrame
            Dataframe with the loss values of the fitted candidate distributions.
        """
        dist_list = []
        total_iterations = len(candidate_distributions)
        with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
            for i in range(len(candidate_distributions)):
                dist_name = candidate_distributions[i].__name__.split(".")[2]
                pbar.set_description(f"Fitting {dist_name} distribution")
                dist_sel = getattr(candidate_distributions[i], dist_name)()
                try:
                    loss, params = dist_sel.calculate_start_values(target=target.reshape(-1, 1), max_iter=max_iter)
                    fit_df = pd.DataFrame.from_dict(
                        {self.loss_fn: loss.reshape(-1,),
                         "distribution": str(dist_name),
                         "params": [params]
                         }
                    )
                except Exception as e:
                    warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                    fit_df = pd.DataFrame(
                        {self.loss_fn: np.nan,
                         "distribution": str(dist_name),
                         "params": [np.nan] * self.n_dist_param
                         }
                    )
                dist_list.append(fit_df)
                pbar.update(1)
            pbar.set_description(f"Fitting of candidate distributions completed")
            fit_df = pd.concat(dist_list).sort_values(by=self.loss_fn, ascending=True)
            fit_df["rank"] = fit_df[self.loss_fn].rank().astype(int)
            fit_df.set_index(fit_df["rank"], inplace=True)

        if plot:
            # Select best distribution
            best_dist = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
            for dist in candidate_distributions:
                if dist.__name__.split(".")[2] == best_dist["distribution"].values[0]:
                    best_dist_sel = dist
                    break
            best_dist_sel = getattr(best_dist_sel, best_dist["distribution"].values[0])()
            params = torch.tensor(best_dist["params"][0]).reshape(-1, best_dist_sel.n_dist_param)

            # Transform parameters to the response scale and draw samples
            fitted_params = np.concatenate(
                [
                    response_fun(params[:, i].reshape(-1, 1)).numpy()
                    for i, (dist_param, response_fun) in enumerate(best_dist_sel.param_dict.items())
                ],
                axis=1,
            )
            fitted_params = pd.DataFrame(fitted_params, columns=best_dist_sel.param_dict.keys())
            n_samples = np.max([10000, target.shape[0]])
            n_samples = np.where(n_samples > 500000, 100000, n_samples)
            dist_samples = best_dist_sel.draw_samples(fitted_params,
                                                      n_samples=n_samples,
                                                      seed=123).values

            # Plot actual and fitted distribution
            plt.figure(figsize=figure_size)
            sns.kdeplot(target.reshape(-1, ), label="Actual")
            sns.kdeplot(dist_samples.reshape(-1, ), label=f"Best-Fit: {best_dist['distribution'].values[0]}")
            plt.legend()
            plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
            plt.show()

        fit_df.drop(columns=["rank", "params"], inplace=True)

        return fit_df
calculate_start_values(target, max_iter=50)

Function that calculates the starting values for each distributional parameter.

Arguments

target: np.ndarray Data from which starting values are calculated. max_iter: int Maximum number of iterations.

Returns

loss: float Loss value. start_values: np.ndarray Starting values for each distributional parameter.

Source code in lightgbmlss/distributions/distribution_utils.py
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
def calculate_start_values(self,
                           target: np.ndarray,
                           max_iter: int = 50
                           ) -> Tuple[float, np.ndarray]:
    """
    Function that calculates the starting values for each distributional parameter.

    Arguments
    ---------
    target: np.ndarray
        Data from which starting values are calculated.
    max_iter: int
        Maximum number of iterations.

    Returns
    -------
    loss: float
        Loss value.
    start_values: np.ndarray
        Starting values for each distributional parameter.
    """
    # Convert target to torch.tensor
    target = torch.tensor(target).reshape(-1, 1)

    # Initialize parameters
    params = [torch.tensor(0.5, requires_grad=True) for _ in range(self.n_dist_param)]

    # Specify optimizer
    optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter / 4), 20]), line_search_fn="strong_wolfe")

    # Define learning rate scheduler
    lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

    # Define closure
    def closure():
        optimizer.zero_grad()
        loss = self.loss_fn_start_values(params, target)
        loss.backward()
        return loss

    # Optimize parameters
    loss_vals = []
    for epoch in range(max_iter):
        loss = optimizer.step(closure)
        lr_scheduler.step(loss)
        loss_vals.append(loss.item())

    # Get final loss
    loss = np.array(loss_vals[-1])

    # Get start values
    start_values = np.array([params[i].detach() for i in range(self.n_dist_param)])

    # Replace any remaining NaNs or infinity values with 0.5
    start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

    return loss, start_values
compute_gradients_and_hessians(loss, predt, weights)

Calculates gradients and hessians.

Output gradients and hessians have shape (n_samples*n_outputs, 1).

Arguments:

loss: torch.Tensor Loss. predt: torch.Tensor List of predicted parameters. weights: np.ndarray Weights.

Returns:

grad: torch.Tensor Gradients. hess: torch.Tensor Hessians.

Source code in lightgbmlss/distributions/distribution_utils.py
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
def compute_gradients_and_hessians(self,
                                   loss: torch.tensor,
                                   predt: torch.tensor,
                                   weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

    """
    Calculates gradients and hessians.

    Output gradients and hessians have shape (n_samples*n_outputs, 1).

    Arguments:
    ---------
    loss: torch.Tensor
        Loss.
    predt: torch.Tensor
        List of predicted parameters.
    weights: np.ndarray
        Weights.

    Returns:
    -------
    grad: torch.Tensor
        Gradients.
    hess: torch.Tensor
        Hessians.
    """
    if self.loss_fn == "nll":
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]
    elif self.loss_fn == "crps":
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [torch.ones_like(grad[i]) for i in range(len(grad))]

        # # Approximation of Hessian
        # step_size = 1e-6
        # predt_upper = [
        #     response_fn(predt[i] + step_size).reshape(-1, 1) for i, response_fn in
        #     enumerate(self.param_dict.values())
        # ]
        # dist_kwargs_upper = dict(zip(self.distribution_arg_names, predt_upper))
        # dist_fit_upper = self.distribution(**dist_kwargs_upper)
        # dist_samples_upper = dist_fit_upper.rsample((30,)).squeeze(-1)
        # loss_upper = torch.nansum(self.crps_score(self.target, dist_samples_upper))
        #
        # predt_lower = [
        #     response_fn(predt[i] - step_size).reshape(-1, 1) for i, response_fn in
        #     enumerate(self.param_dict.values())
        # ]
        # dist_kwargs_lower = dict(zip(self.distribution_arg_names, predt_lower))
        # dist_fit_lower = self.distribution(**dist_kwargs_lower)
        # dist_samples_lower = dist_fit_lower.rsample((30,)).squeeze(-1)
        # loss_lower = torch.nansum(self.crps_score(self.target, dist_samples_lower))
        #
        # grad_upper = autograd(loss_upper, inputs=predt_upper)
        # grad_lower = autograd(loss_lower, inputs=predt_lower)
        # hess = [(grad_upper[i] - grad_lower[i]) / (2 * step_size) for i in range(len(grad))]

    # Stabilization of Derivatives
    if self.stabilization != "None":
        grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
        hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

    # Reshape
    grad = torch.cat(grad, axis=1).detach().numpy()
    hess = torch.cat(hess, axis=1).detach().numpy()

    # Weighting
    grad *= weights
    hess *= weights

    # Reshape
    grad = grad.ravel(order="F")
    hess = hess.ravel(order="F")

    return grad, hess
crps_score(y, yhat_dist)

Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

Parameters

y: torch.Tensor Response variable of shape (n_observations,1). yhat_dist: torch.Tensor Predicted samples of shape (n_samples, n_observations).

Returns

crps: torch.Tensor CRPS score.

References

Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association. 102. 359-378.

Source

https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549

Source code in lightgbmlss/distributions/distribution_utils.py
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
def crps_score(self, y: torch.tensor, yhat_dist: torch.tensor) -> torch.tensor:
    """
    Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

    Parameters
    ----------
    y: torch.Tensor
        Response variable of shape (n_observations,1).
    yhat_dist: torch.Tensor
        Predicted samples of shape (n_samples, n_observations).

    Returns
    -------
    crps: torch.Tensor
        CRPS score.

    References
    ----------
    Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.
    Journal of the American Statistical Association. 102. 359-378.

    Source
    ------
    https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549
    """
    # Get the number of observations
    n_samples = yhat_dist.shape[0]

    # Sort the forecasts in ascending order
    yhat_dist_sorted, _ = torch.sort(yhat_dist, 0)

    # Create temporary tensors
    y_cdf = torch.zeros_like(y)
    yhat_cdf = torch.zeros_like(y)
    yhat_prev = torch.zeros_like(y)
    crps = torch.zeros_like(y)

    # Loop over the predicted samples generated per observation
    for yhat in yhat_dist_sorted:
        yhat = yhat.reshape(-1, 1)
        flag = (y_cdf == 0) * (y < yhat)
        crps += flag * ((y - yhat_prev) * yhat_cdf ** 2)
        crps += flag * ((yhat - y) * (yhat_cdf - 1) ** 2)
        crps += (~flag) * ((yhat - yhat_prev) * (yhat_cdf - y_cdf) ** 2)
        y_cdf += flag
        yhat_cdf += 1 / n_samples
        yhat_prev = yhat

    # In case y_cdf == 0 after the loop
    flag = (y_cdf == 0)
    crps += flag * (y - yhat)

    return crps
dist_select(target, candidate_distributions, max_iter=100, plot=False, figure_size=(10, 5))

Function that selects the most suitable distribution among the candidate_distributions for the target variable, based on the NegLogLikelihood (lower is better).

Parameters

target: np.ndarray Response variable. candidate_distributions: List List of candidate distributions. max_iter: int Maximum number of iterations for the optimization. plot: bool If True, a density plot of the actual and fitted distribution is created. figure_size: tuple Figure size of the density plot.

Returns

fit_df: pd.DataFrame Dataframe with the loss values of the fitted candidate distributions.

Source code in lightgbmlss/distributions/distribution_utils.py
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
def dist_select(self,
                target: np.ndarray,
                candidate_distributions: List,
                max_iter: int = 100,
                plot: bool = False,
                figure_size: tuple = (10, 5),
                ) -> pd.DataFrame:
    """
    Function that selects the most suitable distribution among the candidate_distributions for the target variable,
    based on the NegLogLikelihood (lower is better).

    Parameters
    ----------
    target: np.ndarray
        Response variable.
    candidate_distributions: List
        List of candidate distributions.
    max_iter: int
        Maximum number of iterations for the optimization.
    plot: bool
        If True, a density plot of the actual and fitted distribution is created.
    figure_size: tuple
        Figure size of the density plot.

    Returns
    -------
    fit_df: pd.DataFrame
        Dataframe with the loss values of the fitted candidate distributions.
    """
    dist_list = []
    total_iterations = len(candidate_distributions)
    with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
        for i in range(len(candidate_distributions)):
            dist_name = candidate_distributions[i].__name__.split(".")[2]
            pbar.set_description(f"Fitting {dist_name} distribution")
            dist_sel = getattr(candidate_distributions[i], dist_name)()
            try:
                loss, params = dist_sel.calculate_start_values(target=target.reshape(-1, 1), max_iter=max_iter)
                fit_df = pd.DataFrame.from_dict(
                    {self.loss_fn: loss.reshape(-1,),
                     "distribution": str(dist_name),
                     "params": [params]
                     }
                )
            except Exception as e:
                warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                fit_df = pd.DataFrame(
                    {self.loss_fn: np.nan,
                     "distribution": str(dist_name),
                     "params": [np.nan] * self.n_dist_param
                     }
                )
            dist_list.append(fit_df)
            pbar.update(1)
        pbar.set_description(f"Fitting of candidate distributions completed")
        fit_df = pd.concat(dist_list).sort_values(by=self.loss_fn, ascending=True)
        fit_df["rank"] = fit_df[self.loss_fn].rank().astype(int)
        fit_df.set_index(fit_df["rank"], inplace=True)

    if plot:
        # Select best distribution
        best_dist = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
        for dist in candidate_distributions:
            if dist.__name__.split(".")[2] == best_dist["distribution"].values[0]:
                best_dist_sel = dist
                break
        best_dist_sel = getattr(best_dist_sel, best_dist["distribution"].values[0])()
        params = torch.tensor(best_dist["params"][0]).reshape(-1, best_dist_sel.n_dist_param)

        # Transform parameters to the response scale and draw samples
        fitted_params = np.concatenate(
            [
                response_fun(params[:, i].reshape(-1, 1)).numpy()
                for i, (dist_param, response_fun) in enumerate(best_dist_sel.param_dict.items())
            ],
            axis=1,
        )
        fitted_params = pd.DataFrame(fitted_params, columns=best_dist_sel.param_dict.keys())
        n_samples = np.max([10000, target.shape[0]])
        n_samples = np.where(n_samples > 500000, 100000, n_samples)
        dist_samples = best_dist_sel.draw_samples(fitted_params,
                                                  n_samples=n_samples,
                                                  seed=123).values

        # Plot actual and fitted distribution
        plt.figure(figsize=figure_size)
        sns.kdeplot(target.reshape(-1, ), label="Actual")
        sns.kdeplot(dist_samples.reshape(-1, ), label=f"Best-Fit: {best_dist['distribution'].values[0]}")
        plt.legend()
        plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
        plt.show()

    fit_df.drop(columns=["rank", "params"], inplace=True)

    return fit_df
draw_samples(predt_params, n_samples=1000, seed=123)

Function that draws n_samples from a predicted distribution.

Arguments

predt_params: pd.DataFrame pd.DataFrame with predicted distributional parameters. n_samples: int Number of sample to draw from predicted response distribution. seed: int Manual seed.

Returns

pred_dist: pd.DataFrame DataFrame with n_samples drawn from predicted response distribution.

Source code in lightgbmlss/distributions/distribution_utils.py
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
def draw_samples(self,
                 predt_params: pd.DataFrame,
                 n_samples: int = 1000,
                 seed: int = 123
                 ) -> pd.DataFrame:
    """
    Function that draws n_samples from a predicted distribution.

    Arguments
    ---------
    predt_params: pd.DataFrame
        pd.DataFrame with predicted distributional parameters.
    n_samples: int
        Number of sample to draw from predicted response distribution.
    seed: int
        Manual seed.

    Returns
    -------
    pred_dist: pd.DataFrame
        DataFrame with n_samples drawn from predicted response distribution.

    """
    torch.manual_seed(seed)

    if self.tau is None:
        pred_params = torch.tensor(predt_params.values)
        dist_kwargs = {arg_name: param for arg_name, param in zip(self.distribution_arg_names, pred_params.T)}
        dist_pred = self.distribution(**dist_kwargs)
        dist_samples = dist_pred.sample((n_samples,)).squeeze().detach().numpy().T
        dist_samples = pd.DataFrame(dist_samples)
        dist_samples.columns = [str("y_sample") + str(i) for i in range(dist_samples.shape[1])]
    else:
        dist_samples = None

    if self.discrete:
        dist_samples = dist_samples.astype(int)

    return dist_samples
get_params_loss(predt, target, start_values, requires_grad=False)

Function that returns the predicted parameters and the loss.

Arguments

predt: np.ndarray Predicted values. target: torch.Tensor Target values. start_values: List Starting values for each distributional parameter. requires_grad: bool Whether to add to the computational graph or not.

Returns

predt: torch.Tensor Predicted parameters. loss: torch.Tensor Loss value.

Source code in lightgbmlss/distributions/distribution_utils.py
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
def get_params_loss(self,
                    predt: np.ndarray,
                    target: torch.Tensor,
                    start_values: List[float],
                    requires_grad: bool = False,
                    ) -> Tuple[List[torch.Tensor], np.ndarray]:
    """
    Function that returns the predicted parameters and the loss.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    target: torch.Tensor
        Target values.
    start_values: List
        Starting values for each distributional parameter.
    requires_grad: bool
        Whether to add to the computational graph or not.

    Returns
    -------
    predt: torch.Tensor
        Predicted parameters.
    loss: torch.Tensor
        Loss value.
    """
    # Predicted Parameters
    predt = predt.reshape(-1, self.n_dist_param, order="F")

    # Replace NaNs and infinity values with unconditional start values
    nan_inf_mask = np.isnan(predt) | np.isinf(predt)
    predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

    # Convert to torch.tensor
    predt = [
        torch.tensor(predt[:, i].reshape(-1, 1), requires_grad=requires_grad) for i in range(self.n_dist_param)
    ]

    # Predicted Parameters transformed to response scale
    predt_transformed = [
        response_fn(predt[i].reshape(-1, 1)) for i, response_fn in enumerate(self.param_dict.values())
    ]

    # Specify Distribution and Loss
    if self.tau is None:
        dist_kwargs = dict(zip(self.distribution_arg_names, predt_transformed))
        dist_fit = self.distribution(**dist_kwargs)
        if self.loss_fn == "nll":
            loss = -torch.nansum(dist_fit.log_prob(target))
        elif self.loss_fn == "crps":
            torch.manual_seed(123)
            dist_samples = dist_fit.rsample((30,)).squeeze(-1)
            loss = torch.nansum(self.crps_score(target, dist_samples))
        else:
            raise ValueError("Invalid loss function. Please select 'nll' or 'crps'.")
    else:
        dist_fit = self.distribution(predt_transformed, self.penalize_crossing)
        loss = -torch.nansum(dist_fit.log_prob(target, self.tau))

    return predt, loss
loss_fn_start_values(params, target)

Function that calculates the loss for a given set of distributional parameters. Only used for calculating the loss for the start values.

Parameter

params: torch.Tensor Distributional parameters. target: torch.Tensor Target values.

Returns

loss: torch.Tensor Loss value.

Source code in lightgbmlss/distributions/distribution_utils.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
def loss_fn_start_values(self,
                         params: torch.Tensor,
                         target: torch.Tensor) -> torch.Tensor:
    """
    Function that calculates the loss for a given set of distributional parameters. Only used for calculating
    the loss for the start values.

    Parameter
    ---------
    params: torch.Tensor
        Distributional parameters.
    target: torch.Tensor
        Target values.

    Returns
    -------
    loss: torch.Tensor
        Loss value.
    """
    # Transform parameters to response scale
    params = [
        response_fn(params[i].reshape(-1, 1)) for i, response_fn in enumerate(self.param_dict.values())
    ]

    # Replace NaNs and infinity values with 0.5
    nan_inf_idx = torch.isnan(torch.stack(params)) | torch.isinf(torch.stack(params))
    params = torch.where(nan_inf_idx, torch.tensor(0.5), torch.stack(params))

    # Specify Distribution and Loss
    if self.tau is None:
        dist = self.distribution(*params)
        loss = -torch.nansum(dist.log_prob(target))
    else:
        dist = self.distribution(params, self.penalize_crossing)
        loss = -torch.nansum(dist.log_prob(target, self.tau))

    return loss
metric_fn(predt, data)

Function that evaluates the predictions using the negative log-likelihood.

Arguments

predt: np.ndarray Predicted values. data: lgb.Dataset Data used for training.

Returns

name: str Name of the evaluation metric. nll: float Negative log-likelihood. is_higher_better: bool Whether a higher value of the metric is better or not.

Source code in lightgbmlss/distributions/distribution_utils.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
def metric_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[str, float, bool]:
    """
    Function that evaluates the predictions using the negative log-likelihood.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: lgb.Dataset
        Data used for training.

    Returns
    -------
    name: str
        Name of the evaluation metric.
    nll: float
        Negative log-likelihood.
    is_higher_better: bool
        Whether a higher value of the metric is better or not.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1))
    n_obs = target.shape[0]

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate loss
    is_higher_better = False
    _, loss = self.get_params_loss(predt, target, start_values, requires_grad=False)

    return self.loss_fn, loss / n_obs, is_higher_better
objective_fn(predt, data)

Function to estimate gradients and hessians of distributional parameters.

Arguments

predt: np.ndarray Predicted values. data: lgb.DMatrix Data used for training.

Returns

grad: np.ndarray Gradient. hess: np.ndarray Hessian.

Source code in lightgbmlss/distributions/distribution_utils.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
def objective_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[np.ndarray, np.ndarray]:

    """
    Function to estimate gradients and hessians of distributional parameters.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: lgb.DMatrix
        Data used for training.

    Returns
    -------
    grad: np.ndarray
        Gradient.
    hess: np.ndarray
        Hessian.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1))

    # Weights
    if data.weight is None:
        # Use 1 as weight if no weights are specified
        weights = torch.ones_like(target, dtype=target.dtype).numpy()
    else:
        weights = data.get_weight().reshape(-1, 1)

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate gradients and hessians
    predt, loss = self.get_params_loss(predt, target, start_values, requires_grad=True)
    grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

    return grad, hess
predict_dist(booster, data, start_values, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

booster : lgb.Booster Trained model. data : pd.DataFrame Data to predict from. start_values : np.ndarray. Starting values for each distributional parameter. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. - "expectiles" returns the predicted expectiles. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

pred : pd.DataFrame Predictions.

Source code in lightgbmlss/distributions/distribution_utils.py
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
def predict_dist(self,
                 booster: lgb.Booster,
                 data: pd.DataFrame,
                 start_values: np.ndarray,
                 pred_type: str = "parameters",
                 n_samples: int = 1000,
                 quantiles: list = [0.1, 0.5, 0.9],
                 seed: str = 123
                 ) -> pd.DataFrame:
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    booster : lgb.Booster
        Trained model.
    data : pd.DataFrame
        Data to predict from.
    start_values : np.ndarray.
        Starting values for each distributional parameter.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
        - "expectiles" returns the predicted expectiles.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    pred : pd.DataFrame
        Predictions.
    """

    predt = torch.tensor(
        booster.predict(data, raw_score=True),
        dtype=torch.float32
    ).reshape(-1, self.n_dist_param)

    # Set init_score as starting point for each distributional parameter.
    init_score_pred = torch.tensor(
        np.ones(shape=(data.shape[0], 1))*start_values,
        dtype=torch.float32
    )

    # The predictions don't include the init_score specified in creating the train data.
    # Hence, it needs to be added manually with the corresponding transform for each distributional parameter.
    dist_params_predt = np.concatenate(
        [
            response_fun(
                predt[:, i].reshape(-1, 1) + init_score_pred[:, i].reshape(-1, 1)).numpy()
            for i, (dist_param, response_fun) in enumerate(self.param_dict.items())
        ],
        axis=1,
    )
    dist_params_predt = pd.DataFrame(dist_params_predt)
    dist_params_predt.columns = self.param_dict.keys()

    # Draw samples from predicted response distribution
    pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                        n_samples=n_samples,
                                        seed=seed)

    if pred_type == "parameters":
        return dist_params_predt

    elif pred_type == "expectiles":
        return dist_params_predt

    elif pred_type == "samples":
        return pred_samples_df

    elif pred_type == "quantiles":
        # Calculate quantiles from predicted response distribution
        pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
        pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
        if self.discrete:
            pred_quant_df = pred_quant_df.astype(int)
        return pred_quant_df
stabilize_derivative(input_der, type='MAD')

Function that stabilizes Gradients and Hessians.

As LightGBMLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges, the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve convergence might be to standardize the response variable. This is especially useful if the range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the standardization of the response are not always advised but need to be carefully considered. Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

Parameters

input_der : torch.Tensor Input derivative, either Gradient or Hessian. type: str Stabilization method. Can be either "None", "MAD" or "L2".

Returns

stab_der : torch.Tensor Stabilized Gradient or Hessian.

Source code in lightgbmlss/distributions/distribution_utils.py
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
    """
    Function that stabilizes Gradients and Hessians.

    As LightGBMLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
    that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
    the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
    Another way to improve convergence might be to standardize the response variable. This is especially useful if the
    range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
    the standardization of the response are not always advised but need to be carefully considered.
    Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

    Parameters
    ----------
    input_der : torch.Tensor
        Input derivative, either Gradient or Hessian.
    type: str
        Stabilization method. Can be either "None", "MAD" or "L2".

    Returns
    -------
    stab_der : torch.Tensor
        Stabilized Gradient or Hessian.
    """

    if type == "MAD":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        stab_der = input_der / div

    if type == "L2":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.sqrt(torch.nanmean(input_der.pow(2)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
        stab_der = input_der / div

    if type == "None":
        stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

    return stab_der

flow_utils

NormalizingFlowClass

Generic class that contains general functions for normalizing flows.

Arguments

base_dist: torch.distributions.Distribution PyTorch Distribution class. Currently only Normal is supported. flow_transform: Transform Specify the normalizing flow transform. count_bins: Optional[int] The number of segments comprising the spline. Only used if flow_transform is Spline. bound: Optional[float] The quantity "K" determining the bounding box, [-K,K] x [-K,K] of the spline. By adjusting the "K" value, you can control the size of the bounding box and consequently control the range of inputs that the spline transform operates on. Larger values of "K" will result in a wider valid range for the spline transformation, while smaller values will restrict the valid range to a smaller region. Should be chosen based on the range of the data. Only used if flow_transform is Spline. order: Optional[str] The order of the spline. Options are "linear" or "quadratic". Only used if flow_transform is Spline. n_dist_param: int Number of parameters. param_dict: Dict[str, Any] Dictionary that maps parameters to their response scale. distribution_arg_names: List List of distributional parameter names. target_transform: Transform Specify the target transform. discrete: bool Whether the target is discrete or not. univariate: bool Whether the distribution is univariate or multivariate. stabilization: str Stabilization method. Options are "None", "MAD" or "L2". loss_fn: str Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score). Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable. Hence, using the CRPS disregards any variation in the curvature of the loss function.

Source code in lightgbmlss/distributions/flow_utils.py
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
class NormalizingFlowClass:
    """
    Generic class that contains general functions for normalizing flows.

    Arguments
    ---------
    base_dist: torch.distributions.Distribution
        PyTorch Distribution class. Currently only Normal is supported.
    flow_transform: Transform
        Specify the normalizing flow transform.
    count_bins: Optional[int]
        The number of segments comprising the spline. Only used if flow_transform is Spline.
    bound: Optional[float]
        The quantity "K" determining the bounding box, [-K,K] x [-K,K] of the spline. By adjusting the
        "K" value, you can control the size of the bounding box and consequently control the range of inputs that
        the spline transform operates on. Larger values of "K" will result in a wider valid range for the spline
        transformation, while smaller values will restrict the valid range to a smaller region. Should be chosen
        based on the range of the data. Only used if flow_transform is Spline.
    order: Optional[str]
        The order of the spline. Options are "linear" or "quadratic". Only used if flow_transform is Spline.
    n_dist_param: int
        Number of parameters.
    param_dict: Dict[str, Any]
        Dictionary that maps parameters to their response scale.
    distribution_arg_names: List
        List of distributional parameter names.
    target_transform: Transform
        Specify the target transform.
    discrete: bool
        Whether the target is discrete or not.
    univariate: bool
        Whether the distribution is univariate or multivariate.
    stabilization: str
        Stabilization method. Options are "None", "MAD" or "L2".
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood) or "crps" (continuous ranked probability score).
        Note that if "crps" is used, the Hessian is set to 1, as the current CRPS version is not twice differentiable.
        Hence, using the CRPS disregards any variation in the curvature of the loss function.
    """
    def __init__(self,
                 base_dist: torch.distributions.Distribution = None,
                 flow_transform: Transform = None,
                 count_bins: Optional[int] = 8,
                 bound: Optional[float] = 3.0,
                 order: Optional[str] = "quadratic",
                 n_dist_param: int = None,
                 param_dict: Dict[str, Any] = None,
                 distribution_arg_names: List = None,
                 target_transform: Transform = None,
                 discrete: bool = False,
                 univariate: bool = True,
                 stabilization: str = "None",
                 loss_fn: str = "nll",
                 ):

        self.base_dist = base_dist
        self.flow_transform = flow_transform
        self.count_bins = count_bins
        self.bound = bound
        self.order = order
        self.n_dist_param = n_dist_param
        self.param_dict = param_dict
        self.distribution_arg_names = distribution_arg_names
        self.target_transform = target_transform
        self.discrete = discrete
        self.univariate = univariate
        self.stabilization = stabilization
        self.loss_fn = loss_fn

    def objective_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[np.ndarray, np.ndarray]:

        """
        Function to estimate gradients and hessians of normalizing flow parameters.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: lgb.Dataset
            Data used for training.

        Returns
        -------
        grad: np.ndarray
            Gradient.
        hess: np.ndarray
            Hessian.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1))

        # Weights
        if data.weight is None:
            # Use 1 as weight if no weights are specified
            weights = torch.ones_like(target, dtype=target.dtype).numpy()
        else:
            weights = data.get_weight().reshape(-1, 1)

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate gradients and hessians
        predt, loss = self.get_params_loss(predt, target, start_values)
        grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

        return grad, hess

    def metric_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[str, float, bool]:
        """
        Function that evaluates the predictions using the specified loss function.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: lgb.Dataset
            Data used for training.

        Returns
        -------
        name: str
            Name of the evaluation metric.
        loss: float
            Loss value.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1))
        n_obs = target.shape[0]

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate loss
        is_higher_better = False
        _, loss = self.get_params_loss(predt, target, start_values)

        return self.loss_fn, loss.detach() / n_obs, is_higher_better

    def calculate_start_values(self,
                               target: np.ndarray,
                               max_iter: int = 50
                               ) -> Tuple[float, np.ndarray]:
        """
        Function that calculates starting values for each parameter.

        Arguments
        ---------
        target: np.ndarray
            Data from which starting values are calculated.
        max_iter: int
            Maximum number of iterations.

        Returns
        -------
        loss: float
            Loss value.
        start_values: np.ndarray
            Starting values for each parameter.
        """
        # Convert target to torch.tensor
        target = torch.tensor(target).reshape(-1, 1)

        # Create Normalizing Flow
        flow_dist = self.create_spline_flow(input_dim=1)

        # Specify optimizer
        optimizer = LBFGS(flow_dist.transforms[0].parameters(),
                          lr=0.3,
                          max_iter=np.min([int(max_iter/4), 50]),
                          line_search_fn="strong_wolfe")

        # Define learning rate scheduler
        lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=5)

        # Define closure
        def closure():
            optimizer.zero_grad()
            loss = -torch.nansum(flow_dist.log_prob(target))
            loss.backward()
            flow_dist.clear_cache()
            return loss

        # Optimize parameters
        loss_vals = []
        tolerance = 1e-5           # Tolerance level for loss change
        patience = 5               # Patience level for loss change
        best_loss = float("inf")
        epochs_without_change = 0

        for epoch in range(max_iter):
            optimizer.zero_grad()
            loss = optimizer.step(closure)
            lr_scheduler.step(loss)
            loss_vals.append(loss.item())

            # Stopping criterion (no improvement in loss)
            if loss.item() < best_loss - tolerance:
                best_loss = loss.item()
                epochs_without_change = 0
            else:
                epochs_without_change += 1

            if epochs_without_change >= patience:
                break

        # Get final loss
        loss = np.array(loss_vals[-1])

        # Get start values
        start_values = list(flow_dist.transforms[0].parameters())
        start_values = torch.cat([param.view(-1) for param in start_values]).detach().numpy()

        # Replace any remaining NaNs or infinity values with 0.5
        start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

        return loss, start_values

    def get_params_loss(self,
                        predt: np.ndarray,
                        target: torch.Tensor,
                        start_values: List[float],
                        requires_grad: bool = False,
                        ) -> Tuple[List[torch.Tensor], np.ndarray]:
        """
        Function that returns the predicted parameters and the loss.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        target: torch.Tensor
            Target values.
        start_values: List
            Starting values for each parameter.

        Returns
        -------
        predt: torch.Tensor
            Predicted parameters.
        loss: torch.Tensor
            Loss value.
        """
        # Reshape Target
        target = target.view(-1)

        # Predicted Parameters
        predt = predt.reshape(-1, self.n_dist_param, order="F")

        # Replace NaNs and infinity values with unconditional start values
        nan_inf_mask = np.isnan(predt) | np.isinf(predt)
        predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

        # Convert to torch.tensor
        predt = torch.tensor(predt, dtype=torch.float32)

        # Specify Normalizing Flow
        flow_dist = self.create_spline_flow(target.shape[0])

        # Replace parameters with estimated ones
        params, flow_dist = self.replace_parameters(predt, flow_dist)

        # Calculate loss
        if self.loss_fn == "nll":
            loss = -torch.nansum(flow_dist.log_prob(target))
        elif self.loss_fn == "crps":
            torch.manual_seed(123)
            dist_samples = flow_dist.rsample((30,)).squeeze(-1)
            loss = torch.nansum(self.crps_score(target, dist_samples))
        else:
            raise ValueError("Invalid loss function. Please select 'nll' or 'crps'.")

        return params, loss

    def create_spline_flow(self,
                           input_dim: int = None,
                           ) -> Transform:

        """
        Function that constructs a Normalizing Flow.

        Arguments
        ---------
        input_dim: int
            Input dimension.

        Returns
        -------
        spline_flow: Transform
            Normalizing Flow.
        """

        # Create flow distribution (currently only Normal)
        loc, scale = torch.zeros(input_dim), torch.ones(input_dim)
        flow_dist = self.base_dist(loc, scale)

        # Create Spline Transform
        torch.manual_seed(123)
        spline_transform = self.flow_transform(input_dim,
                                               count_bins=self.count_bins,
                                               bound=self.bound,
                                               order=self.order)

        # Create Normalizing Flow
        spline_flow = TransformedDistribution(flow_dist, [spline_transform, self.target_transform])

        return spline_flow

    def replace_parameters(self,
                           params: torch.Tensor,
                           flow_dist: Transform,
                           ) -> Tuple[List, Transform]:
        """
        Replace parameters with estimated ones.

        Arguments
        ---------
        params: torch.Tensor
            Estimated parameters.
        flow_dist: Transform
            Normalizing Flow.

        Returns
        -------
        params_list: List
            List of estimated parameters.
        flow_dist: Transform
            Normalizing Flow with estimated parameters.
        """

        # Split parameters into list
        if self.order == "quadratic":
            params_list = torch.split(
                params, [self.count_bins, self.count_bins, self.count_bins - 1],
                dim=1)
        elif self.order == "linear":
            params_list = torch.split(
                params, [self.count_bins, self.count_bins, self.count_bins - 1, self.count_bins],
                dim=1)

        # Replace parameters
        for param, new_value in zip(flow_dist.transforms[0].parameters(), params_list):
            param.data = new_value

        # Get parameters (including require_grad=True)
        params_list = list(flow_dist.transforms[0].parameters())

        return params_list, flow_dist

    def draw_samples(self,
                     predt_params: pd.DataFrame,
                     n_samples: int = 1000,
                     seed: int = 123
                     ) -> pd.DataFrame:
        """
        Function that draws n_samples from a predicted distribution.

        Arguments
        ---------
        predt_params: pd.DataFrame
            pd.DataFrame with predicted distributional parameters.
        n_samples: int
            Number of sample to draw from predicted response distribution.
        seed: int
            Manual seed.

        Returns
        -------
        pred_dist: pd.DataFrame
            DataFrame with n_samples drawn from predicted response distribution.

        """

        torch.manual_seed(seed)

        # Specify Normalizing Flow
        pred_params = torch.tensor(predt_params.values)
        flow_dist_pred = self.create_spline_flow(pred_params.shape[0])

        # Replace parameters with estimated ones
        _, flow_dist_pred = self.replace_parameters(pred_params, flow_dist_pred)

        # Draw samples
        flow_samples = pd.DataFrame(flow_dist_pred.sample((n_samples,)).squeeze().detach().numpy().T)
        flow_samples.columns = [str("y_sample") + str(i) for i in range(flow_samples.shape[1])]

        if self.discrete:
            flow_samples = flow_samples.astype(int)

        return flow_samples

    def predict_dist(self,
                     booster: lgb.Booster,
                     data: pd.DataFrame,
                     start_values: np.ndarray,
                     pred_type: str = "parameters",
                     n_samples: int = 1000,
                     quantiles: list = [0.1, 0.5, 0.9],
                     seed: str = 123
                     ) -> pd.DataFrame:
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        booster : lgb.Booster
            Trained model.
        start_values : np.ndarray
            Starting values for each distributional parameter.
        data : pd.DataFrame
            Data to predict from.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
            - "expectiles" returns the predicted expectiles.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        pred : pd.DataFrame
            Predictions.
        """
        # Predict raw scores
        predt = torch.tensor(
            booster.predict(data, raw_score=True),
            dtype=torch.float32
        ).reshape(-1, self.n_dist_param)

        # Set init_score as starting point for each distributional parameter.
        init_score_pred = torch.tensor(
            np.ones(shape=(data.shape[0], 1)) * start_values,
            dtype=torch.float32
        )

        # The predictions don't include the init_score specified in creating the train data. Hence, it needs to be
        # added manually.
        dist_params_predt = pd.DataFrame(
            np.concatenate(
                [predt[:, i].reshape(-1, 1) + init_score_pred[:, i].reshape(-1, 1) for i in range(self.n_dist_param)],
                axis=1
            )
        )
        dist_params_predt.columns = self.param_dict.keys()

        # Draw samples from predicted response distribution
        pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                            n_samples=n_samples,
                                            seed=seed)

        if pred_type == "parameters":
            return dist_params_predt

        elif pred_type == "samples":
            return pred_samples_df

        elif pred_type == "quantiles":
            # Calculate quantiles from predicted response distribution
            pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
            pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
            if self.discrete:
                pred_quant_df = pred_quant_df.astype(int)
            return pred_quant_df

    def compute_gradients_and_hessians(self,
                                       loss: torch.tensor,
                                       predt: torch.tensor,
                                       weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

        """
        Calculates gradients and hessians.

        Output gradients and hessians have shape (n_samples*n_outputs, 1).

        Arguments:
        ---------
        loss: torch.Tensor
            Loss.
        predt: torch.Tensor
            List of predicted parameters.
        weights: np.ndarray
            Weights.

        Returns:
        -------
        grad: torch.Tensor
            Gradients.
        hess: torch.Tensor
            Hessians.
        """
        if self.loss_fn == "nll":
            # Gradient and Hessian
            grad = autograd(loss, inputs=predt, create_graph=True)
            hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]
        elif self.loss_fn == "crps":
            # Gradient and Hessian
            grad = autograd(loss, inputs=predt, create_graph=True)
            hess = [torch.ones_like(grad[i]) for i in range(len(grad))]

        # Stabilization of Derivatives
        if self.stabilization != "None":
            grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
            hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

        # Reshape
        grad = torch.cat(grad, axis=1).detach().numpy()
        hess = torch.cat(hess, axis=1).detach().numpy()

        # Weighting
        grad *= weights
        hess *= weights

        # Reshape
        grad = grad.ravel(order="F")
        hess = hess.ravel(order="F")

        return grad, hess

    def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
        """
        Function that stabilizes Gradients and Hessians.

        Since parameters are estimated by optimizing Gradients and Hessians, it is important that these are comparable
        in magnitude for all parameters. Due to imbalances regarding the ranges, the estimation might become unstable
        so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve
        convergence might be to standardize the response variable. This is especially useful if the range of the
        response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the
        standardization of the response are not always advised but need to be carefully considered.

        Source
        ---------
        https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

        Arguments
        ---------
        input_der : torch.Tensor
            Input derivative, either Gradient or Hessian.
        type: str
            Stabilization method. Can be either "None", "MAD" or "L2".

        Returns
        ---------
        stab_der : torch.Tensor
            Stabilized Gradient or Hessian.
        """

        if type == "MAD":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            stab_der = input_der / div

        if type == "L2":
            input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
            div = torch.sqrt(torch.nanmean(input_der.pow(2)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
            stab_der = input_der / div

        if type == "None":
            stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

        return stab_der

    def crps_score(self, y: torch.tensor, yhat_dist: torch.tensor) -> torch.tensor:
        """
        Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

        Arguments
        ---------
        y: torch.Tensor
            Response variable of shape (n_observations,1).
        yhat_dist: torch.Tensor
            Predicted samples of shape (n_samples, n_observations).

        Returns
        ---------
        crps: torch.Tensor
            CRPS score.

        References
        ---------
        Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.
        Journal of the American Statistical Association. 102. 359-378.

        Source
        ---------
        https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549
        """
        # Get the number of observations
        n_samples = yhat_dist.shape[0]

        # Sort the forecasts in ascending order
        yhat_dist_sorted, _ = torch.sort(yhat_dist, 0)

        # Create temporary tensors
        y_cdf = torch.zeros_like(y)
        yhat_cdf = torch.zeros_like(y)
        yhat_prev = torch.zeros_like(y)
        crps = torch.zeros_like(y)

        # Loop over the predicted samples generated per observation
        for yhat in yhat_dist_sorted:
            yhat = yhat.reshape(-1, 1)
            flag = (y_cdf == 0) * (y < yhat)
            crps += flag * ((y - yhat_prev) * yhat_cdf ** 2)
            crps += flag * ((yhat - y) * (yhat_cdf - 1) ** 2)
            crps += (~flag) * ((yhat - yhat_prev) * (yhat_cdf - y_cdf) ** 2)
            y_cdf += flag
            yhat_cdf += 1 / n_samples
            yhat_prev = yhat

        # In case y_cdf == 0 after the loop
        flag = (y_cdf == 0)
        crps += flag * (y - yhat)

        return crps

    def flow_select(self,
                    target: np.ndarray,
                    candidate_flows: List,
                    max_iter: int = 100,
                    plot: bool = False,
                    figure_size: tuple = (10, 5),
                    ) -> pd.DataFrame:
        """
        Function that selects the most suitable normalizing flow specification among the candidate_flow for the
        target variable, based on the NegLogLikelihood (lower is better).

        Parameters
        ----------
        target: np.ndarray
            Response variable.
        candidate_flows: List
            List of candidate normalizing flow specifications.
        max_iter: int
            Maximum number of iterations for the optimization.
        plot: bool
            If True, a density plot of the actual and fitted distribution is created.
        figure_size: tuple
            Figure size of the density plot.

        Returns
        -------
        fit_df: pd.DataFrame
            Dataframe with the loss values of the fitted normalizing flow.
        """
        flow_list = []
        total_iterations = len(candidate_flows)

        with tqdm(total=total_iterations, desc="Fitting candidate normalizing flows") as pbar:
            for flow in candidate_flows:
                flow_name = str(flow.__class__).split(".")[-1].split("'>")[0]
                flow_spec = f"(count_bins: {flow.count_bins}, order: {flow.order})"
                flow_name = flow_name + flow_spec
                pbar.set_description(f"Fitting {flow_name}")
                flow_sel = flow
                try:
                    loss, params = flow_sel.calculate_start_values(target=target, max_iter=max_iter)
                    fit_df = pd.DataFrame.from_dict(
                        {flow_sel.loss_fn: loss.reshape(-1, ),
                         "NormFlow": str(flow_name),
                         "params": [params]
                         }
                    )
                except Exception as e:
                    warnings.warn(f"Error fitting {flow_sel} NormFlow: {str(e)}")
                    fit_df = pd.DataFrame(
                        {flow_sel.loss_fn: np.nan,
                         "NormFlow": str(flow_sel),
                         "params": [np.nan] * flow_sel.n_dist_param
                         }
                    )
                flow_list.append(fit_df)
                pbar.update(1)
            pbar.set_description(f"Fitting of candidate normalizing flows completed")
            fit_df = pd.concat(flow_list).sort_values(by=flow_sel.loss_fn, ascending=True)
            fit_df["rank"] = fit_df[flow_sel.loss_fn].rank().astype(int)
            fit_df.set_index(fit_df["rank"], inplace=True)

        if plot:
            # Select normalizing flow with the lowest loss
            best_flow = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
            for flow in candidate_flows:
                flow_name = str(flow.__class__).split(".")[-1].split("'>")[0]
                flow_spec = f"(count_bins: {flow.count_bins}, order: {flow.order})"
                flow_name = flow_name + flow_spec
                if flow_name == best_flow["NormFlow"].values[0]:
                    best_flow_sel = flow
                    break

            # Draw samples from distribution
            flow_params = torch.tensor(best_flow["params"][0]).reshape(1, -1)
            flow_dist_sel = best_flow_sel.create_spline_flow(input_dim=1)
            _, flow_dist_sel = best_flow_sel.replace_parameters(flow_params, flow_dist_sel)
            n_samples = np.max([10000, target.shape[0]])
            n_samples = np.where(n_samples > 500000, 100000, n_samples)
            flow_samples = pd.DataFrame(flow_dist_sel.sample((n_samples,)).squeeze().detach().numpy().T).values

            # Plot actual and fitted distribution
            plt.figure(figsize=figure_size)
            sns.kdeplot(target.reshape(-1, ), label="Actual")
            sns.kdeplot(flow_samples.reshape(-1, ), label=f"Best-Fit: {best_flow['NormFlow'].values[0]}")
            plt.legend()
            plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
            plt.show()

        fit_df.drop(columns=["rank", "params"], inplace=True)

        return fit_df
calculate_start_values(target, max_iter=50)

Function that calculates starting values for each parameter.

Arguments

target: np.ndarray Data from which starting values are calculated. max_iter: int Maximum number of iterations.

Returns

loss: float Loss value. start_values: np.ndarray Starting values for each parameter.

Source code in lightgbmlss/distributions/flow_utils.py
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
def calculate_start_values(self,
                           target: np.ndarray,
                           max_iter: int = 50
                           ) -> Tuple[float, np.ndarray]:
    """
    Function that calculates starting values for each parameter.

    Arguments
    ---------
    target: np.ndarray
        Data from which starting values are calculated.
    max_iter: int
        Maximum number of iterations.

    Returns
    -------
    loss: float
        Loss value.
    start_values: np.ndarray
        Starting values for each parameter.
    """
    # Convert target to torch.tensor
    target = torch.tensor(target).reshape(-1, 1)

    # Create Normalizing Flow
    flow_dist = self.create_spline_flow(input_dim=1)

    # Specify optimizer
    optimizer = LBFGS(flow_dist.transforms[0].parameters(),
                      lr=0.3,
                      max_iter=np.min([int(max_iter/4), 50]),
                      line_search_fn="strong_wolfe")

    # Define learning rate scheduler
    lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=5)

    # Define closure
    def closure():
        optimizer.zero_grad()
        loss = -torch.nansum(flow_dist.log_prob(target))
        loss.backward()
        flow_dist.clear_cache()
        return loss

    # Optimize parameters
    loss_vals = []
    tolerance = 1e-5           # Tolerance level for loss change
    patience = 5               # Patience level for loss change
    best_loss = float("inf")
    epochs_without_change = 0

    for epoch in range(max_iter):
        optimizer.zero_grad()
        loss = optimizer.step(closure)
        lr_scheduler.step(loss)
        loss_vals.append(loss.item())

        # Stopping criterion (no improvement in loss)
        if loss.item() < best_loss - tolerance:
            best_loss = loss.item()
            epochs_without_change = 0
        else:
            epochs_without_change += 1

        if epochs_without_change >= patience:
            break

    # Get final loss
    loss = np.array(loss_vals[-1])

    # Get start values
    start_values = list(flow_dist.transforms[0].parameters())
    start_values = torch.cat([param.view(-1) for param in start_values]).detach().numpy()

    # Replace any remaining NaNs or infinity values with 0.5
    start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

    return loss, start_values
compute_gradients_and_hessians(loss, predt, weights)

Calculates gradients and hessians.

Output gradients and hessians have shape (n_samples*n_outputs, 1).

Arguments:

loss: torch.Tensor Loss. predt: torch.Tensor List of predicted parameters. weights: np.ndarray Weights.

Returns:

grad: torch.Tensor Gradients. hess: torch.Tensor Hessians.

Source code in lightgbmlss/distributions/flow_utils.py
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
def compute_gradients_and_hessians(self,
                                   loss: torch.tensor,
                                   predt: torch.tensor,
                                   weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

    """
    Calculates gradients and hessians.

    Output gradients and hessians have shape (n_samples*n_outputs, 1).

    Arguments:
    ---------
    loss: torch.Tensor
        Loss.
    predt: torch.Tensor
        List of predicted parameters.
    weights: np.ndarray
        Weights.

    Returns:
    -------
    grad: torch.Tensor
        Gradients.
    hess: torch.Tensor
        Hessians.
    """
    if self.loss_fn == "nll":
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]
    elif self.loss_fn == "crps":
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [torch.ones_like(grad[i]) for i in range(len(grad))]

    # Stabilization of Derivatives
    if self.stabilization != "None":
        grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
        hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

    # Reshape
    grad = torch.cat(grad, axis=1).detach().numpy()
    hess = torch.cat(hess, axis=1).detach().numpy()

    # Weighting
    grad *= weights
    hess *= weights

    # Reshape
    grad = grad.ravel(order="F")
    hess = hess.ravel(order="F")

    return grad, hess
create_spline_flow(input_dim=None)

Function that constructs a Normalizing Flow.

Arguments

input_dim: int Input dimension.

Returns

spline_flow: Transform Normalizing Flow.

Source code in lightgbmlss/distributions/flow_utils.py
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
def create_spline_flow(self,
                       input_dim: int = None,
                       ) -> Transform:

    """
    Function that constructs a Normalizing Flow.

    Arguments
    ---------
    input_dim: int
        Input dimension.

    Returns
    -------
    spline_flow: Transform
        Normalizing Flow.
    """

    # Create flow distribution (currently only Normal)
    loc, scale = torch.zeros(input_dim), torch.ones(input_dim)
    flow_dist = self.base_dist(loc, scale)

    # Create Spline Transform
    torch.manual_seed(123)
    spline_transform = self.flow_transform(input_dim,
                                           count_bins=self.count_bins,
                                           bound=self.bound,
                                           order=self.order)

    # Create Normalizing Flow
    spline_flow = TransformedDistribution(flow_dist, [spline_transform, self.target_transform])

    return spline_flow
crps_score(y, yhat_dist)

Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

Arguments

y: torch.Tensor Response variable of shape (n_observations,1). yhat_dist: torch.Tensor Predicted samples of shape (n_samples, n_observations).

Returns

crps: torch.Tensor CRPS score.

References

Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association. 102. 359-378.

Source

https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549

Source code in lightgbmlss/distributions/flow_utils.py
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
def crps_score(self, y: torch.tensor, yhat_dist: torch.tensor) -> torch.tensor:
    """
    Function that calculates the Continuous Ranked Probability Score (CRPS) for a given set of predicted samples.

    Arguments
    ---------
    y: torch.Tensor
        Response variable of shape (n_observations,1).
    yhat_dist: torch.Tensor
        Predicted samples of shape (n_samples, n_observations).

    Returns
    ---------
    crps: torch.Tensor
        CRPS score.

    References
    ---------
    Gneiting, Tilmann & Raftery, Adrian. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation.
    Journal of the American Statistical Association. 102. 359-378.

    Source
    ---------
    https://github.com/elephaint/pgbm/blob/main/pgbm/torch/pgbm_dist.py#L549
    """
    # Get the number of observations
    n_samples = yhat_dist.shape[0]

    # Sort the forecasts in ascending order
    yhat_dist_sorted, _ = torch.sort(yhat_dist, 0)

    # Create temporary tensors
    y_cdf = torch.zeros_like(y)
    yhat_cdf = torch.zeros_like(y)
    yhat_prev = torch.zeros_like(y)
    crps = torch.zeros_like(y)

    # Loop over the predicted samples generated per observation
    for yhat in yhat_dist_sorted:
        yhat = yhat.reshape(-1, 1)
        flag = (y_cdf == 0) * (y < yhat)
        crps += flag * ((y - yhat_prev) * yhat_cdf ** 2)
        crps += flag * ((yhat - y) * (yhat_cdf - 1) ** 2)
        crps += (~flag) * ((yhat - yhat_prev) * (yhat_cdf - y_cdf) ** 2)
        y_cdf += flag
        yhat_cdf += 1 / n_samples
        yhat_prev = yhat

    # In case y_cdf == 0 after the loop
    flag = (y_cdf == 0)
    crps += flag * (y - yhat)

    return crps
draw_samples(predt_params, n_samples=1000, seed=123)

Function that draws n_samples from a predicted distribution.

Arguments

predt_params: pd.DataFrame pd.DataFrame with predicted distributional parameters. n_samples: int Number of sample to draw from predicted response distribution. seed: int Manual seed.

Returns

pred_dist: pd.DataFrame DataFrame with n_samples drawn from predicted response distribution.

Source code in lightgbmlss/distributions/flow_utils.py
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
def draw_samples(self,
                 predt_params: pd.DataFrame,
                 n_samples: int = 1000,
                 seed: int = 123
                 ) -> pd.DataFrame:
    """
    Function that draws n_samples from a predicted distribution.

    Arguments
    ---------
    predt_params: pd.DataFrame
        pd.DataFrame with predicted distributional parameters.
    n_samples: int
        Number of sample to draw from predicted response distribution.
    seed: int
        Manual seed.

    Returns
    -------
    pred_dist: pd.DataFrame
        DataFrame with n_samples drawn from predicted response distribution.

    """

    torch.manual_seed(seed)

    # Specify Normalizing Flow
    pred_params = torch.tensor(predt_params.values)
    flow_dist_pred = self.create_spline_flow(pred_params.shape[0])

    # Replace parameters with estimated ones
    _, flow_dist_pred = self.replace_parameters(pred_params, flow_dist_pred)

    # Draw samples
    flow_samples = pd.DataFrame(flow_dist_pred.sample((n_samples,)).squeeze().detach().numpy().T)
    flow_samples.columns = [str("y_sample") + str(i) for i in range(flow_samples.shape[1])]

    if self.discrete:
        flow_samples = flow_samples.astype(int)

    return flow_samples
flow_select(target, candidate_flows, max_iter=100, plot=False, figure_size=(10, 5))

Function that selects the most suitable normalizing flow specification among the candidate_flow for the target variable, based on the NegLogLikelihood (lower is better).

Parameters

target: np.ndarray Response variable. candidate_flows: List List of candidate normalizing flow specifications. max_iter: int Maximum number of iterations for the optimization. plot: bool If True, a density plot of the actual and fitted distribution is created. figure_size: tuple Figure size of the density plot.

Returns

fit_df: pd.DataFrame Dataframe with the loss values of the fitted normalizing flow.

Source code in lightgbmlss/distributions/flow_utils.py
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
def flow_select(self,
                target: np.ndarray,
                candidate_flows: List,
                max_iter: int = 100,
                plot: bool = False,
                figure_size: tuple = (10, 5),
                ) -> pd.DataFrame:
    """
    Function that selects the most suitable normalizing flow specification among the candidate_flow for the
    target variable, based on the NegLogLikelihood (lower is better).

    Parameters
    ----------
    target: np.ndarray
        Response variable.
    candidate_flows: List
        List of candidate normalizing flow specifications.
    max_iter: int
        Maximum number of iterations for the optimization.
    plot: bool
        If True, a density plot of the actual and fitted distribution is created.
    figure_size: tuple
        Figure size of the density plot.

    Returns
    -------
    fit_df: pd.DataFrame
        Dataframe with the loss values of the fitted normalizing flow.
    """
    flow_list = []
    total_iterations = len(candidate_flows)

    with tqdm(total=total_iterations, desc="Fitting candidate normalizing flows") as pbar:
        for flow in candidate_flows:
            flow_name = str(flow.__class__).split(".")[-1].split("'>")[0]
            flow_spec = f"(count_bins: {flow.count_bins}, order: {flow.order})"
            flow_name = flow_name + flow_spec
            pbar.set_description(f"Fitting {flow_name}")
            flow_sel = flow
            try:
                loss, params = flow_sel.calculate_start_values(target=target, max_iter=max_iter)
                fit_df = pd.DataFrame.from_dict(
                    {flow_sel.loss_fn: loss.reshape(-1, ),
                     "NormFlow": str(flow_name),
                     "params": [params]
                     }
                )
            except Exception as e:
                warnings.warn(f"Error fitting {flow_sel} NormFlow: {str(e)}")
                fit_df = pd.DataFrame(
                    {flow_sel.loss_fn: np.nan,
                     "NormFlow": str(flow_sel),
                     "params": [np.nan] * flow_sel.n_dist_param
                     }
                )
            flow_list.append(fit_df)
            pbar.update(1)
        pbar.set_description(f"Fitting of candidate normalizing flows completed")
        fit_df = pd.concat(flow_list).sort_values(by=flow_sel.loss_fn, ascending=True)
        fit_df["rank"] = fit_df[flow_sel.loss_fn].rank().astype(int)
        fit_df.set_index(fit_df["rank"], inplace=True)

    if plot:
        # Select normalizing flow with the lowest loss
        best_flow = fit_df[fit_df["rank"] == 1].reset_index(drop=True)
        for flow in candidate_flows:
            flow_name = str(flow.__class__).split(".")[-1].split("'>")[0]
            flow_spec = f"(count_bins: {flow.count_bins}, order: {flow.order})"
            flow_name = flow_name + flow_spec
            if flow_name == best_flow["NormFlow"].values[0]:
                best_flow_sel = flow
                break

        # Draw samples from distribution
        flow_params = torch.tensor(best_flow["params"][0]).reshape(1, -1)
        flow_dist_sel = best_flow_sel.create_spline_flow(input_dim=1)
        _, flow_dist_sel = best_flow_sel.replace_parameters(flow_params, flow_dist_sel)
        n_samples = np.max([10000, target.shape[0]])
        n_samples = np.where(n_samples > 500000, 100000, n_samples)
        flow_samples = pd.DataFrame(flow_dist_sel.sample((n_samples,)).squeeze().detach().numpy().T).values

        # Plot actual and fitted distribution
        plt.figure(figsize=figure_size)
        sns.kdeplot(target.reshape(-1, ), label="Actual")
        sns.kdeplot(flow_samples.reshape(-1, ), label=f"Best-Fit: {best_flow['NormFlow'].values[0]}")
        plt.legend()
        plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
        plt.show()

    fit_df.drop(columns=["rank", "params"], inplace=True)

    return fit_df
get_params_loss(predt, target, start_values, requires_grad=False)

Function that returns the predicted parameters and the loss.

Arguments

predt: np.ndarray Predicted values. target: torch.Tensor Target values. start_values: List Starting values for each parameter.

Returns

predt: torch.Tensor Predicted parameters. loss: torch.Tensor Loss value.

Source code in lightgbmlss/distributions/flow_utils.py
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
def get_params_loss(self,
                    predt: np.ndarray,
                    target: torch.Tensor,
                    start_values: List[float],
                    requires_grad: bool = False,
                    ) -> Tuple[List[torch.Tensor], np.ndarray]:
    """
    Function that returns the predicted parameters and the loss.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    target: torch.Tensor
        Target values.
    start_values: List
        Starting values for each parameter.

    Returns
    -------
    predt: torch.Tensor
        Predicted parameters.
    loss: torch.Tensor
        Loss value.
    """
    # Reshape Target
    target = target.view(-1)

    # Predicted Parameters
    predt = predt.reshape(-1, self.n_dist_param, order="F")

    # Replace NaNs and infinity values with unconditional start values
    nan_inf_mask = np.isnan(predt) | np.isinf(predt)
    predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

    # Convert to torch.tensor
    predt = torch.tensor(predt, dtype=torch.float32)

    # Specify Normalizing Flow
    flow_dist = self.create_spline_flow(target.shape[0])

    # Replace parameters with estimated ones
    params, flow_dist = self.replace_parameters(predt, flow_dist)

    # Calculate loss
    if self.loss_fn == "nll":
        loss = -torch.nansum(flow_dist.log_prob(target))
    elif self.loss_fn == "crps":
        torch.manual_seed(123)
        dist_samples = flow_dist.rsample((30,)).squeeze(-1)
        loss = torch.nansum(self.crps_score(target, dist_samples))
    else:
        raise ValueError("Invalid loss function. Please select 'nll' or 'crps'.")

    return params, loss
metric_fn(predt, data)

Function that evaluates the predictions using the specified loss function.

Arguments

predt: np.ndarray Predicted values. data: lgb.Dataset Data used for training.

Returns

name: str Name of the evaluation metric. loss: float Loss value.

Source code in lightgbmlss/distributions/flow_utils.py
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
def metric_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[str, float, bool]:
    """
    Function that evaluates the predictions using the specified loss function.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: lgb.Dataset
        Data used for training.

    Returns
    -------
    name: str
        Name of the evaluation metric.
    loss: float
        Loss value.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1))
    n_obs = target.shape[0]

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate loss
    is_higher_better = False
    _, loss = self.get_params_loss(predt, target, start_values)

    return self.loss_fn, loss.detach() / n_obs, is_higher_better
objective_fn(predt, data)

Function to estimate gradients and hessians of normalizing flow parameters.

Arguments

predt: np.ndarray Predicted values. data: lgb.Dataset Data used for training.

Returns

grad: np.ndarray Gradient. hess: np.ndarray Hessian.

Source code in lightgbmlss/distributions/flow_utils.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
def objective_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[np.ndarray, np.ndarray]:

    """
    Function to estimate gradients and hessians of normalizing flow parameters.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: lgb.Dataset
        Data used for training.

    Returns
    -------
    grad: np.ndarray
        Gradient.
    hess: np.ndarray
        Hessian.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1))

    # Weights
    if data.weight is None:
        # Use 1 as weight if no weights are specified
        weights = torch.ones_like(target, dtype=target.dtype).numpy()
    else:
        weights = data.get_weight().reshape(-1, 1)

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate gradients and hessians
    predt, loss = self.get_params_loss(predt, target, start_values)
    grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

    return grad, hess
predict_dist(booster, data, start_values, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

booster : lgb.Booster Trained model. start_values : np.ndarray Starting values for each distributional parameter. data : pd.DataFrame Data to predict from. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. - "expectiles" returns the predicted expectiles. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

pred : pd.DataFrame Predictions.

Source code in lightgbmlss/distributions/flow_utils.py
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
def predict_dist(self,
                 booster: lgb.Booster,
                 data: pd.DataFrame,
                 start_values: np.ndarray,
                 pred_type: str = "parameters",
                 n_samples: int = 1000,
                 quantiles: list = [0.1, 0.5, 0.9],
                 seed: str = 123
                 ) -> pd.DataFrame:
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    booster : lgb.Booster
        Trained model.
    start_values : np.ndarray
        Starting values for each distributional parameter.
    data : pd.DataFrame
        Data to predict from.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
        - "expectiles" returns the predicted expectiles.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    pred : pd.DataFrame
        Predictions.
    """
    # Predict raw scores
    predt = torch.tensor(
        booster.predict(data, raw_score=True),
        dtype=torch.float32
    ).reshape(-1, self.n_dist_param)

    # Set init_score as starting point for each distributional parameter.
    init_score_pred = torch.tensor(
        np.ones(shape=(data.shape[0], 1)) * start_values,
        dtype=torch.float32
    )

    # The predictions don't include the init_score specified in creating the train data. Hence, it needs to be
    # added manually.
    dist_params_predt = pd.DataFrame(
        np.concatenate(
            [predt[:, i].reshape(-1, 1) + init_score_pred[:, i].reshape(-1, 1) for i in range(self.n_dist_param)],
            axis=1
        )
    )
    dist_params_predt.columns = self.param_dict.keys()

    # Draw samples from predicted response distribution
    pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                        n_samples=n_samples,
                                        seed=seed)

    if pred_type == "parameters":
        return dist_params_predt

    elif pred_type == "samples":
        return pred_samples_df

    elif pred_type == "quantiles":
        # Calculate quantiles from predicted response distribution
        pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
        pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
        if self.discrete:
            pred_quant_df = pred_quant_df.astype(int)
        return pred_quant_df
replace_parameters(params, flow_dist)

Replace parameters with estimated ones.

Arguments

params: torch.Tensor Estimated parameters. flow_dist: Transform Normalizing Flow.

Returns

params_list: List List of estimated parameters. flow_dist: Transform Normalizing Flow with estimated parameters.

Source code in lightgbmlss/distributions/flow_utils.py
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
def replace_parameters(self,
                       params: torch.Tensor,
                       flow_dist: Transform,
                       ) -> Tuple[List, Transform]:
    """
    Replace parameters with estimated ones.

    Arguments
    ---------
    params: torch.Tensor
        Estimated parameters.
    flow_dist: Transform
        Normalizing Flow.

    Returns
    -------
    params_list: List
        List of estimated parameters.
    flow_dist: Transform
        Normalizing Flow with estimated parameters.
    """

    # Split parameters into list
    if self.order == "quadratic":
        params_list = torch.split(
            params, [self.count_bins, self.count_bins, self.count_bins - 1],
            dim=1)
    elif self.order == "linear":
        params_list = torch.split(
            params, [self.count_bins, self.count_bins, self.count_bins - 1, self.count_bins],
            dim=1)

    # Replace parameters
    for param, new_value in zip(flow_dist.transforms[0].parameters(), params_list):
        param.data = new_value

    # Get parameters (including require_grad=True)
    params_list = list(flow_dist.transforms[0].parameters())

    return params_list, flow_dist
stabilize_derivative(input_der, type='MAD')

Function that stabilizes Gradients and Hessians.

Since parameters are estimated by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all parameters. Due to imbalances regarding the ranges, the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve convergence might be to standardize the response variable. This is especially useful if the range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the standardization of the response are not always advised but need to be carefully considered.

Source

https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

Arguments

input_der : torch.Tensor Input derivative, either Gradient or Hessian. type: str Stabilization method. Can be either "None", "MAD" or "L2".

Returns

stab_der : torch.Tensor Stabilized Gradient or Hessian.

Source code in lightgbmlss/distributions/flow_utils.py
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
    """
    Function that stabilizes Gradients and Hessians.

    Since parameters are estimated by optimizing Gradients and Hessians, it is important that these are comparable
    in magnitude for all parameters. Due to imbalances regarding the ranges, the estimation might become unstable
    so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve
    convergence might be to standardize the response variable. This is especially useful if the range of the
    response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the
    standardization of the response are not always advised but need to be carefully considered.

    Source
    ---------
    https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

    Arguments
    ---------
    input_der : torch.Tensor
        Input derivative, either Gradient or Hessian.
    type: str
        Stabilization method. Can be either "None", "MAD" or "L2".

    Returns
    ---------
    stab_der : torch.Tensor
        Stabilized Gradient or Hessian.
    """

    if type == "MAD":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        stab_der = input_der / div

    if type == "L2":
        input_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))
        div = torch.sqrt(torch.nanmean(input_der.pow(2)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
        stab_der = input_der / div

    if type == "None":
        stab_der = torch.nan_to_num(input_der, nan=float(torch.nanmean(input_der)))

    return stab_der

mixture_distribution_utils

MixtureDistributionClass

Generic class that contains general functions for mixed-density distributions.

Arguments

distribution: torch.distributions.Distribution PyTorch Distribution class. M: int Number of components in the mixture distribution. temperature: float Temperature for the Gumbel-Softmax distribution. hessian_mode: str Mode for computing the Hessian. Must be one of the following:

    - "individual": Each parameter is treated as a separate tensor. As a result, when the Hessian is calculated
    for each gradient element, this corresponds to the second derivative with respect to that specific tensor
    element only. This means the resulting Hessians capture the curvature of the loss w.r.t. each individual
    parameter. This is usually more runtime intensive, but can also be more accurate.

    - "grouped": Each parameter is a tensor containing all values for a specific parameter type,
    e.g., loc, scale, or mixture probabilities for a Gaussian Mixture. When computing the Hessian for each
    gradient element, the Hessian matrix for all the values in the respective tensor are calculated together.
    The resulting Hessians capture the curvature of the loss w.r.t. the entire parameter type tensor. This is
    usually less runtime intensive, but can be less accurate.

univariate: bool Whether the distribution is univariate or multivariate. discrete: bool Whether the support of the distribution is discrete or continuous. n_dist_param: int Number of distributional parameters. stabilization: str Stabilization method. param_dict: Dict[str, Any] Dictionary that maps distributional parameters to their response scale. distribution_arg_names: List List of distributional parameter names. loss_fn: str Loss function. Options are "nll" (negative log-likelihood).

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
class MixtureDistributionClass:
    """
    Generic class that contains general functions for mixed-density distributions.

    Arguments
    ---------
    distribution: torch.distributions.Distribution
        PyTorch Distribution class.
    M: int
        Number of components in the mixture distribution.
    temperature: float
        Temperature for the Gumbel-Softmax distribution.
    hessian_mode: str
        Mode for computing the Hessian. Must be one of the following:

            - "individual": Each parameter is treated as a separate tensor. As a result, when the Hessian is calculated
            for each gradient element, this corresponds to the second derivative with respect to that specific tensor
            element only. This means the resulting Hessians capture the curvature of the loss w.r.t. each individual
            parameter. This is usually more runtime intensive, but can also be more accurate.

            - "grouped": Each parameter is a tensor containing all values for a specific parameter type,
            e.g., loc, scale, or mixture probabilities for a Gaussian Mixture. When computing the Hessian for each
            gradient element, the Hessian matrix for all the values in the respective tensor are calculated together.
            The resulting Hessians capture the curvature of the loss w.r.t. the entire parameter type tensor. This is
            usually less runtime intensive, but can be less accurate.
    univariate: bool
        Whether the distribution is univariate or multivariate.
    discrete: bool
        Whether the support of the distribution is discrete or continuous.
    n_dist_param: int
        Number of distributional parameters.
    stabilization: str
        Stabilization method.
    param_dict: Dict[str, Any]
        Dictionary that maps distributional parameters to their response scale.
    distribution_arg_names: List
        List of distributional parameter names.
    loss_fn: str
        Loss function. Options are "nll" (negative log-likelihood).
    """
    def __init__(self,
                 distribution: torch.distributions.Distribution = None,
                 M: int = 2,
                 temperature: float = 1.0,
                 hessian_mode: str = "individual",
                 univariate: bool = True,
                 discrete: bool = False,
                 n_dist_param: int = None,
                 stabilization: str = "None",
                 param_dict: Dict[str, Any] = None,
                 distribution_arg_names: List = None,
                 loss_fn: str = "nll",
                 ):

        self.distribution = distribution
        self.M = M
        self.temperature = temperature
        self.hessian_mode = hessian_mode
        self.univariate = univariate
        self.discrete = discrete
        self.n_dist_param = n_dist_param
        self.stabilization = stabilization
        self.param_dict = param_dict
        self.distribution_arg_names = distribution_arg_names
        self.loss_fn = loss_fn

    def objective_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[np.ndarray, np.ndarray]:

        """
        Function to estimate gradients and hessians of distributional parameters.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: lgb.DMatrix
            Data used for training.

        Returns
        -------
        grad: np.ndarray
            Gradient.
        hess: np.ndarray
            Hessian.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1), dtype=torch.float32)

        # Weights
        if data.weight is None:
            # Use 1 as weight if no weights are specified
            weights = np.ones_like(target, dtype="float32")
        else:
            weights = data.get_weight().reshape(-1, 1)

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate gradients and hessians
        predt, loss = self.get_params_loss(predt, target.flatten(), start_values, requires_grad=True)
        grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

        return grad, hess

    def metric_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[str, float, bool]:
        """
        Function that evaluates the predictions using the negative log-likelihood.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        data: lgb.Dataset
            Data used for training.

        Returns
        -------
        name: str
            Name of the evaluation metric.
        nll: float
            Negative log-likelihood.
        is_higher_better: bool
            Whether a higher value of the metric is better or not.
        """
        # Target
        target = torch.tensor(data.get_label().reshape(-1, 1), dtype=torch.float32)
        n_obs = target.shape[0]

        # Start values (needed to replace NaNs in predt)
        start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

        # Calculate loss
        is_higher_better = False
        _, loss = self.get_params_loss(predt, target.flatten(), start_values, requires_grad=False)

        return self.loss_fn, loss / n_obs, is_higher_better

    def create_mixture_distribution(self,
                                    params: List[torch.Tensor],
                                    ) -> torch.distributions.Distribution:
        """
        Function that creates a mixture distribution.

        Arguments
        ---------
        params: torch.Tensor
            Distributional parameters.

        Returns
        -------
        dist: torch.distributions.Distribution
            Mixture distribution.
        """

        # Create Mixture Distribution
        mixture_cat = Categorical(probs=params[-1])
        mixture_comp = self.distribution.distribution(*params[:-1])
        mixture_dist = MixtureSameFamily(mixture_cat, mixture_comp)

        return mixture_dist

    def loss_fn_start_values(self,
                             params: torch.Tensor,
                             target: torch.Tensor) -> torch.Tensor:
        """
        Function that calculates the loss for a given set of distributional parameters. Only used for calculating
        the loss for the start values.

        Parameter
        ---------
        params: torch.Tensor
            Distributional parameters.
        target: torch.Tensor
            Target values.

        Returns
        -------
        loss: torch.Tensor
            Loss value.
        """
        # Replace NaNs and infinity values with 0.5
        nan_inf_idx = torch.isnan(torch.stack(params)) | torch.isinf(torch.stack(params))
        params = torch.where(nan_inf_idx, torch.tensor(0.5), torch.stack(params)).reshape(1, -1)
        params = torch.split(params, self.M, dim=1)

        # Transform parameters to response scale
        params = [response_fn(params[i]) for i, response_fn in enumerate(self.param_dict.values())]

        # Specify Distribution and Loss
        dist = self.create_mixture_distribution(params)
        loss = -torch.nansum(dist.log_prob(target))

        return loss

    def calculate_start_values(self,
                               target: np.ndarray,
                               max_iter: int = 50
                               ) -> Tuple[float, np.ndarray]:
        """
        Function that calculates the starting values for each distributional parameter.

        Arguments
        ---------
        target: np.ndarray
            Data from which starting values are calculated.
        max_iter: int
            Maximum number of iterations.

        Returns
        -------
        loss: float
            Loss value.
        start_values: np.ndarray
            Starting values for each distributional parameter.
        """
        # Convert target to torch.tensor
        target = torch.tensor(target, dtype=torch.float32).flatten()

        # Initialize parameters
        params = [torch.tensor(0.5, requires_grad=True) for _ in range(self.n_dist_param)]

        # Specify optimizer
        optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter / 4), 20]), line_search_fn="strong_wolfe")

        # Define learning rate scheduler
        lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

        # Define closure
        def closure():
            optimizer.zero_grad()
            loss = self.loss_fn_start_values(params, target)
            loss.backward()
            return loss

        # Optimize parameters
        loss_vals = []
        tolerance = 1e-5
        patience = 5
        best_loss = float("inf")
        epochs_without_change = 0

        for epoch in range(max_iter):
            optimizer.zero_grad()
            loss = optimizer.step(closure)
            lr_scheduler.step(loss)
            loss_vals.append(loss.item())

            # Stopping criterion (no improvement in loss)
            if loss.item() < best_loss - tolerance:
                best_loss = loss.item()
                epochs_without_change = 0
            else:
                epochs_without_change += 1

            if epochs_without_change >= patience:
                break

        # Get final loss
        loss = np.array(loss_vals[-1])

        # Get start values
        start_values = np.array([params[i].detach() for i in range(self.n_dist_param)])

        # Replace any remaining NaNs or infinity values with 0.5
        start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

        return loss, start_values

    def get_params_loss(self,
                        predt: np.ndarray,
                        target: torch.Tensor,
                        start_values: List[float],
                        requires_grad: bool = False,
                        ) -> Tuple[List[torch.Tensor], np.ndarray]:
        """
        Function that returns the predicted parameters and the loss.

        Arguments
        ---------
        predt: np.ndarray
            Predicted values.
        target: torch.Tensor
            Target values.
        start_values: List
            Starting values for each distributional parameter.
        requires_grad: bool
            Whether to add to the computational graph or not.

        Returns
        -------
        predt: torch.Tensor
            Predicted parameters.
        loss: torch.Tensor
            Loss value.
        """
        # Predicted Parameters
        predt = predt.reshape(-1, self.n_dist_param, order="F")

        # Replace NaNs and infinity values with unconditional start values
        nan_inf_mask = np.isnan(predt) | np.isinf(predt)
        predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

        if self.hessian_mode == "grouped":
            # Convert to torch.Tensor: splits the parameters into tensors for each parameter-type
            predt = torch.split(torch.tensor(predt, requires_grad=requires_grad), self.M, dim=1)
            # Transform parameters to response scale
            predt_transformed = [response_fn(predt[i]) for i, response_fn in enumerate(self.param_dict.values())]

        else:
            # Convert to torch.Tensor: splits the parameters into tensors for each parameter individually
            predt = torch.split(torch.tensor(predt, requires_grad=requires_grad), 1, dim=1)
            # Transform parameters to response scale
            keys = list(self.param_dict.keys())
            max_index = len(self.param_dict) * self.M
            index_ranges = []
            for i in range(0, max_index, self.M):
                if i + self.M >= max_index:
                    index_ranges.append((i, None))
                    break
                index_ranges.append((i, i + self.M))

            predt_transformed = [
                self.param_dict[key](torch.cat(predt[start:end], dim=1))
                for key, (start, end) in zip(keys, index_ranges)
            ]

        # Specify Distribution and Loss
        dist_fit = self.create_mixture_distribution(predt_transformed)
        loss = -torch.nansum(dist_fit.log_prob(target))

        return predt, loss

    def draw_samples(self,
                     predt_params: pd.DataFrame,
                     n_samples: int = 1000,
                     seed: int = 123
                     ) -> pd.DataFrame:
        """
        Function that draws n_samples from a predicted distribution.

        Arguments
        ---------
        predt_params: pd.DataFrame
            pd.DataFrame with predicted distributional parameters.
        n_samples: int
            Number of sample to draw from predicted response distribution.
        seed: int
            Manual seed.

        Returns
        -------
        pred_dist: pd.DataFrame
            DataFrame with n_samples drawn from predicted response distribution.

        """
        torch.manual_seed(seed)

        pred_params = torch.tensor(predt_params.values).reshape(-1, self.n_dist_param)
        pred_params = torch.split(pred_params, self.M, dim=1)
        dist_pred = self.create_mixture_distribution(pred_params)
        dist_samples = dist_pred.sample((n_samples,)).squeeze().detach().numpy().T
        dist_samples = pd.DataFrame(dist_samples)
        dist_samples.columns = [str("y_sample") + str(i) for i in range(dist_samples.shape[1])]

        if self.discrete:
            dist_samples = dist_samples.astype(int)

        return dist_samples

    def predict_dist(self,
                     booster: lgb.Booster,
                     data: pd.DataFrame,
                     start_values: np.ndarray,
                     pred_type: str = "parameters",
                     n_samples: int = 1000,
                     quantiles: list = [0.1, 0.5, 0.9],
                     seed: str = 123
                     ) -> pd.DataFrame:
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        booster : lgb.Booster
            Trained model.
        data : pd.DataFrame
            Data to predict from.
        start_values : np.ndarray.
            Starting values for each distributional parameter.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        pred : pd.DataFrame
            Predictions.
        """
        predt = torch.tensor(
            booster.predict(data, raw_score=True),
            dtype=torch.float32
        ).reshape(-1, self.n_dist_param)

        # Set init_score as starting point for each distributional parameter.
        init_score_pred = torch.tensor(
            np.ones(shape=(data.shape[0], 1))*start_values,
            dtype=torch.float32
        )

        # The predictions don't include the init_score specified in creating the train data.
        # Hence, it needs to be added manually with the corresponding transform for each distributional parameter.
        dist_params_predt = torch.split(
            torch.cat(
                [
                    predt[:, i].reshape(-1, 1) + init_score_pred[:, i].reshape(-1, 1)
                    for i in range(self.n_dist_param)
                ], axis=1
            ), self.M, dim=1
        )

        dist_params_predt = np.concatenate(
            [
                response_fn(dist_params_predt[i]).numpy()
                for i, response_fn in enumerate(self.param_dict.values())
            ],
            axis=1,
        )

        dist_params_predt = pd.DataFrame(dist_params_predt)
        dist_params_predt.columns = self.distribution_arg_names

        # Draw samples from predicted response distribution
        pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                            n_samples=n_samples,
                                            seed=seed)

        if pred_type == "parameters":
            return dist_params_predt

        elif pred_type == "samples":
            return pred_samples_df

        elif pred_type == "quantiles":
            # Calculate quantiles from predicted response distribution
            pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
            pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
            if self.discrete:
                pred_quant_df = pred_quant_df.astype(int)
            return pred_quant_df

    def compute_gradients_and_hessians(self,
                                       loss: torch.tensor,
                                       predt: torch.tensor,
                                       weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

        """
        Calculates gradients and hessians.

        Output gradients and hessians have shape (n_samples*n_outputs, 1).

        Arguments:
        ---------
        loss: torch.Tensor
            Loss.
        predt: torch.Tensor
            List of predicted parameters.
        weights: np.ndarray
            Weights.

        Returns:
        -------
        grad: torch.Tensor
            Gradients.
        hess: torch.Tensor
            Hessians.
        """
        # Gradient and Hessian
        grad = autograd(loss, inputs=predt, create_graph=True)
        hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]

        # Stabilization of Derivatives
        if self.stabilization != "None":
            grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
            hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

        # Reshape
        grad = torch.cat(grad, axis=1).detach().squeeze(-1).numpy()
        hess = torch.cat(hess, axis=1).detach().squeeze(-1).numpy()

        # Weighting
        grad *= weights
        hess *= weights

        # Reshape
        grad = grad.ravel(order="F")
        hess = hess.ravel(order="F")

        return grad, hess

    def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
        """
        Function that stabilizes Gradients and Hessians.

        As LightGBMLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
        that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
        the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
        Another way to improve convergence might be to standardize the response variable. This is especially useful if the
        range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
        the standardization of the response are not always advised but need to be carefully considered.
        Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

        Parameters
        ----------
        input_der : torch.Tensor
            Input derivative, either Gradient or Hessian.
        type: str
            Stabilization method. Can be either "None", "MAD" or "L2".

        Returns
        -------
        stab_der : torch.Tensor
            Stabilized Gradient or Hessian.
        """

        if type == "MAD":
            input_der = torch.nan_to_num(input_der,
                                         nan=float(torch.nanmean(input_der)),
                                         posinf=float(torch.nanmean(input_der)),
                                         neginf=float(torch.nanmean(input_der))
                                         )
            div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            stab_der = input_der / div

        if type == "L2":
            input_der = torch.nan_to_num(input_der,
                                         nan=float(torch.nanmean(input_der)),
                                         posinf=float(torch.nanmean(input_der)),
                                         neginf=float(torch.nanmean(input_der))
                                         )
            div = torch.sqrt(torch.nanmean(input_der.pow(2)))
            div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
            div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
            stab_der = input_der / div

        if type == "None":
            stab_der = torch.nan_to_num(input_der,
                                        nan=float(torch.nanmean(input_der)),
                                        posinf=float(torch.nanmean(input_der)),
                                        neginf=float(torch.nanmean(input_der))
                                        )

        return stab_der

    def dist_select(self,
                    target: np.ndarray,
                    candidate_distributions: List,
                    max_iter: int = 100,
                    plot: bool = False,
                    figure_size: tuple = (8, 5),
                    ) -> pd.DataFrame:
        """
        Function that selects the most suitable distribution among the candidate_distributions for the target variable,
        based on the NegLogLikelihood (lower is better).

        Parameters
        ----------
        target: np.ndarray
            Response variable.
        candidate_distributions: List
            List of candidate distributions.
        max_iter: int
            Maximum number of iterations for the optimization.
        plot: bool
            If True, a density plot of the actual and fitted distribution is created.
        figure_size: tuple
            Figure size of the density plot.

        Returns
        -------
        fit_df: pd.DataFrame
            Dataframe with the loss values of the fitted candidate distributions.
        """
        dist_list = []
        total_iterations = len(candidate_distributions)
        with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
            for i in range(len(candidate_distributions)):
                dist_name = candidate_distributions[i].distribution.__class__.__name__
                n_mix = candidate_distributions[i].M
                tau = candidate_distributions[i].temperature
                dist_name = f"Mixture({dist_name}, tau={tau}, M={n_mix})"
                pbar.set_description(f"Fitting {dist_name} distribution")
                try:
                    loss, params = candidate_distributions[i].calculate_start_values(target=target, max_iter=max_iter)
                    fit_df = pd.DataFrame.from_dict(
                        {candidate_distributions[i].loss_fn: loss.reshape(-1, ),
                         "distribution": str(dist_name),
                         "params": [params],
                         "dist_pos": i,
                         "M": candidate_distributions[i].M
                         }
                    )
                except Exception as e:
                    warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                    fit_df = pd.DataFrame(
                        {candidate_distributions[i].loss_fn: np.nan,
                         "distribution": str(dist_name),
                         "params": [np.nan] * self.n_dist_param,
                         "dist_pos": i,
                         "M": candidate_distributions[i].M
                         }
                    )
                dist_list.append(fit_df)
                pbar.update(1)
            pbar.set_description(f"Fitting of candidate distributions completed")
            fit_df = pd.concat(dist_list).sort_values(by=candidate_distributions[i].loss_fn, ascending=True)
            fit_df["rank"] = fit_df[candidate_distributions[i].loss_fn].rank().astype(int)
            fit_df.set_index(fit_df["rank"], inplace=True)

        if plot:
            # Select best distribution
            best_dist = fit_df[fit_df["rank"] == fit_df["rank"].min()].reset_index(drop=True).iloc[[0]]
            best_dist_pos = int(best_dist["dist_pos"].values[0])
            best_dist_sel = candidate_distributions[best_dist_pos]
            params = torch.tensor(best_dist["params"][0]).reshape(1, -1)
            params = torch.split(params, best_dist_sel.M, dim=1)

            fitted_params = np.concatenate(
                [
                    response_fun(params[i]).numpy()
                    for i, (dist_param, response_fun) in enumerate(best_dist_sel.param_dict.items())
                ],
                axis=1,
            )

            fitted_params = pd.DataFrame(fitted_params, columns=best_dist_sel.distribution_arg_names)
            n_samples = np.max([10000, target.shape[0]])
            n_samples = np.where(n_samples > 500000, 100000, n_samples)
            dist_samples = best_dist_sel.draw_samples(fitted_params,
                                                      n_samples=n_samples,
                                                      seed=123).values

            # Plot actual and fitted distribution
            plt.figure(figsize=figure_size)
            sns.kdeplot(target.reshape(-1,), label="Actual")
            sns.kdeplot(dist_samples.reshape(-1,), label=f"Best-Fit: {best_dist['distribution'].values[0]}")
            plt.legend()
            plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
            plt.show()

        fit_df.drop(columns=["rank", "params", "dist_pos", "M"], inplace=True)

        return fit_df
calculate_start_values(target, max_iter=50)

Function that calculates the starting values for each distributional parameter.

Arguments

target: np.ndarray Data from which starting values are calculated. max_iter: int Maximum number of iterations.

Returns

loss: float Loss value. start_values: np.ndarray Starting values for each distributional parameter.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
def calculate_start_values(self,
                           target: np.ndarray,
                           max_iter: int = 50
                           ) -> Tuple[float, np.ndarray]:
    """
    Function that calculates the starting values for each distributional parameter.

    Arguments
    ---------
    target: np.ndarray
        Data from which starting values are calculated.
    max_iter: int
        Maximum number of iterations.

    Returns
    -------
    loss: float
        Loss value.
    start_values: np.ndarray
        Starting values for each distributional parameter.
    """
    # Convert target to torch.tensor
    target = torch.tensor(target, dtype=torch.float32).flatten()

    # Initialize parameters
    params = [torch.tensor(0.5, requires_grad=True) for _ in range(self.n_dist_param)]

    # Specify optimizer
    optimizer = LBFGS(params, lr=0.1, max_iter=np.min([int(max_iter / 4), 20]), line_search_fn="strong_wolfe")

    # Define learning rate scheduler
    lr_scheduler = ReduceLROnPlateau(optimizer, mode="min", factor=0.5, patience=10)

    # Define closure
    def closure():
        optimizer.zero_grad()
        loss = self.loss_fn_start_values(params, target)
        loss.backward()
        return loss

    # Optimize parameters
    loss_vals = []
    tolerance = 1e-5
    patience = 5
    best_loss = float("inf")
    epochs_without_change = 0

    for epoch in range(max_iter):
        optimizer.zero_grad()
        loss = optimizer.step(closure)
        lr_scheduler.step(loss)
        loss_vals.append(loss.item())

        # Stopping criterion (no improvement in loss)
        if loss.item() < best_loss - tolerance:
            best_loss = loss.item()
            epochs_without_change = 0
        else:
            epochs_without_change += 1

        if epochs_without_change >= patience:
            break

    # Get final loss
    loss = np.array(loss_vals[-1])

    # Get start values
    start_values = np.array([params[i].detach() for i in range(self.n_dist_param)])

    # Replace any remaining NaNs or infinity values with 0.5
    start_values = np.nan_to_num(start_values, nan=0.5, posinf=0.5, neginf=0.5)

    return loss, start_values
compute_gradients_and_hessians(loss, predt, weights)

Calculates gradients and hessians.

Output gradients and hessians have shape (n_samples*n_outputs, 1).

Arguments:

loss: torch.Tensor Loss. predt: torch.Tensor List of predicted parameters. weights: np.ndarray Weights.

Returns:

grad: torch.Tensor Gradients. hess: torch.Tensor Hessians.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
def compute_gradients_and_hessians(self,
                                   loss: torch.tensor,
                                   predt: torch.tensor,
                                   weights: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:

    """
    Calculates gradients and hessians.

    Output gradients and hessians have shape (n_samples*n_outputs, 1).

    Arguments:
    ---------
    loss: torch.Tensor
        Loss.
    predt: torch.Tensor
        List of predicted parameters.
    weights: np.ndarray
        Weights.

    Returns:
    -------
    grad: torch.Tensor
        Gradients.
    hess: torch.Tensor
        Hessians.
    """
    # Gradient and Hessian
    grad = autograd(loss, inputs=predt, create_graph=True)
    hess = [autograd(grad[i].nansum(), inputs=predt[i], retain_graph=True)[0] for i in range(len(grad))]

    # Stabilization of Derivatives
    if self.stabilization != "None":
        grad = [self.stabilize_derivative(grad[i], type=self.stabilization) for i in range(len(grad))]
        hess = [self.stabilize_derivative(hess[i], type=self.stabilization) for i in range(len(hess))]

    # Reshape
    grad = torch.cat(grad, axis=1).detach().squeeze(-1).numpy()
    hess = torch.cat(hess, axis=1).detach().squeeze(-1).numpy()

    # Weighting
    grad *= weights
    hess *= weights

    # Reshape
    grad = grad.ravel(order="F")
    hess = hess.ravel(order="F")

    return grad, hess
create_mixture_distribution(params)

Function that creates a mixture distribution.

Arguments

params: torch.Tensor Distributional parameters.

Returns

dist: torch.distributions.Distribution Mixture distribution.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
def create_mixture_distribution(self,
                                params: List[torch.Tensor],
                                ) -> torch.distributions.Distribution:
    """
    Function that creates a mixture distribution.

    Arguments
    ---------
    params: torch.Tensor
        Distributional parameters.

    Returns
    -------
    dist: torch.distributions.Distribution
        Mixture distribution.
    """

    # Create Mixture Distribution
    mixture_cat = Categorical(probs=params[-1])
    mixture_comp = self.distribution.distribution(*params[:-1])
    mixture_dist = MixtureSameFamily(mixture_cat, mixture_comp)

    return mixture_dist
dist_select(target, candidate_distributions, max_iter=100, plot=False, figure_size=(8, 5))

Function that selects the most suitable distribution among the candidate_distributions for the target variable, based on the NegLogLikelihood (lower is better).

Parameters

target: np.ndarray Response variable. candidate_distributions: List List of candidate distributions. max_iter: int Maximum number of iterations for the optimization. plot: bool If True, a density plot of the actual and fitted distribution is created. figure_size: tuple Figure size of the density plot.

Returns

fit_df: pd.DataFrame Dataframe with the loss values of the fitted candidate distributions.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
def dist_select(self,
                target: np.ndarray,
                candidate_distributions: List,
                max_iter: int = 100,
                plot: bool = False,
                figure_size: tuple = (8, 5),
                ) -> pd.DataFrame:
    """
    Function that selects the most suitable distribution among the candidate_distributions for the target variable,
    based on the NegLogLikelihood (lower is better).

    Parameters
    ----------
    target: np.ndarray
        Response variable.
    candidate_distributions: List
        List of candidate distributions.
    max_iter: int
        Maximum number of iterations for the optimization.
    plot: bool
        If True, a density plot of the actual and fitted distribution is created.
    figure_size: tuple
        Figure size of the density plot.

    Returns
    -------
    fit_df: pd.DataFrame
        Dataframe with the loss values of the fitted candidate distributions.
    """
    dist_list = []
    total_iterations = len(candidate_distributions)
    with tqdm(total=total_iterations, desc="Fitting candidate distributions") as pbar:
        for i in range(len(candidate_distributions)):
            dist_name = candidate_distributions[i].distribution.__class__.__name__
            n_mix = candidate_distributions[i].M
            tau = candidate_distributions[i].temperature
            dist_name = f"Mixture({dist_name}, tau={tau}, M={n_mix})"
            pbar.set_description(f"Fitting {dist_name} distribution")
            try:
                loss, params = candidate_distributions[i].calculate_start_values(target=target, max_iter=max_iter)
                fit_df = pd.DataFrame.from_dict(
                    {candidate_distributions[i].loss_fn: loss.reshape(-1, ),
                     "distribution": str(dist_name),
                     "params": [params],
                     "dist_pos": i,
                     "M": candidate_distributions[i].M
                     }
                )
            except Exception as e:
                warnings.warn(f"Error fitting {dist_name} distribution: {str(e)}")
                fit_df = pd.DataFrame(
                    {candidate_distributions[i].loss_fn: np.nan,
                     "distribution": str(dist_name),
                     "params": [np.nan] * self.n_dist_param,
                     "dist_pos": i,
                     "M": candidate_distributions[i].M
                     }
                )
            dist_list.append(fit_df)
            pbar.update(1)
        pbar.set_description(f"Fitting of candidate distributions completed")
        fit_df = pd.concat(dist_list).sort_values(by=candidate_distributions[i].loss_fn, ascending=True)
        fit_df["rank"] = fit_df[candidate_distributions[i].loss_fn].rank().astype(int)
        fit_df.set_index(fit_df["rank"], inplace=True)

    if plot:
        # Select best distribution
        best_dist = fit_df[fit_df["rank"] == fit_df["rank"].min()].reset_index(drop=True).iloc[[0]]
        best_dist_pos = int(best_dist["dist_pos"].values[0])
        best_dist_sel = candidate_distributions[best_dist_pos]
        params = torch.tensor(best_dist["params"][0]).reshape(1, -1)
        params = torch.split(params, best_dist_sel.M, dim=1)

        fitted_params = np.concatenate(
            [
                response_fun(params[i]).numpy()
                for i, (dist_param, response_fun) in enumerate(best_dist_sel.param_dict.items())
            ],
            axis=1,
        )

        fitted_params = pd.DataFrame(fitted_params, columns=best_dist_sel.distribution_arg_names)
        n_samples = np.max([10000, target.shape[0]])
        n_samples = np.where(n_samples > 500000, 100000, n_samples)
        dist_samples = best_dist_sel.draw_samples(fitted_params,
                                                  n_samples=n_samples,
                                                  seed=123).values

        # Plot actual and fitted distribution
        plt.figure(figsize=figure_size)
        sns.kdeplot(target.reshape(-1,), label="Actual")
        sns.kdeplot(dist_samples.reshape(-1,), label=f"Best-Fit: {best_dist['distribution'].values[0]}")
        plt.legend()
        plt.title("Actual vs. Best-Fit Density", fontweight="bold", fontsize=16)
        plt.show()

    fit_df.drop(columns=["rank", "params", "dist_pos", "M"], inplace=True)

    return fit_df
draw_samples(predt_params, n_samples=1000, seed=123)

Function that draws n_samples from a predicted distribution.

Arguments

predt_params: pd.DataFrame pd.DataFrame with predicted distributional parameters. n_samples: int Number of sample to draw from predicted response distribution. seed: int Manual seed.

Returns

pred_dist: pd.DataFrame DataFrame with n_samples drawn from predicted response distribution.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
def draw_samples(self,
                 predt_params: pd.DataFrame,
                 n_samples: int = 1000,
                 seed: int = 123
                 ) -> pd.DataFrame:
    """
    Function that draws n_samples from a predicted distribution.

    Arguments
    ---------
    predt_params: pd.DataFrame
        pd.DataFrame with predicted distributional parameters.
    n_samples: int
        Number of sample to draw from predicted response distribution.
    seed: int
        Manual seed.

    Returns
    -------
    pred_dist: pd.DataFrame
        DataFrame with n_samples drawn from predicted response distribution.

    """
    torch.manual_seed(seed)

    pred_params = torch.tensor(predt_params.values).reshape(-1, self.n_dist_param)
    pred_params = torch.split(pred_params, self.M, dim=1)
    dist_pred = self.create_mixture_distribution(pred_params)
    dist_samples = dist_pred.sample((n_samples,)).squeeze().detach().numpy().T
    dist_samples = pd.DataFrame(dist_samples)
    dist_samples.columns = [str("y_sample") + str(i) for i in range(dist_samples.shape[1])]

    if self.discrete:
        dist_samples = dist_samples.astype(int)

    return dist_samples
get_params_loss(predt, target, start_values, requires_grad=False)

Function that returns the predicted parameters and the loss.

Arguments

predt: np.ndarray Predicted values. target: torch.Tensor Target values. start_values: List Starting values for each distributional parameter. requires_grad: bool Whether to add to the computational graph or not.

Returns

predt: torch.Tensor Predicted parameters. loss: torch.Tensor Loss value.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
def get_params_loss(self,
                    predt: np.ndarray,
                    target: torch.Tensor,
                    start_values: List[float],
                    requires_grad: bool = False,
                    ) -> Tuple[List[torch.Tensor], np.ndarray]:
    """
    Function that returns the predicted parameters and the loss.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    target: torch.Tensor
        Target values.
    start_values: List
        Starting values for each distributional parameter.
    requires_grad: bool
        Whether to add to the computational graph or not.

    Returns
    -------
    predt: torch.Tensor
        Predicted parameters.
    loss: torch.Tensor
        Loss value.
    """
    # Predicted Parameters
    predt = predt.reshape(-1, self.n_dist_param, order="F")

    # Replace NaNs and infinity values with unconditional start values
    nan_inf_mask = np.isnan(predt) | np.isinf(predt)
    predt[nan_inf_mask] = np.take(start_values, np.where(nan_inf_mask)[1])

    if self.hessian_mode == "grouped":
        # Convert to torch.Tensor: splits the parameters into tensors for each parameter-type
        predt = torch.split(torch.tensor(predt, requires_grad=requires_grad), self.M, dim=1)
        # Transform parameters to response scale
        predt_transformed = [response_fn(predt[i]) for i, response_fn in enumerate(self.param_dict.values())]

    else:
        # Convert to torch.Tensor: splits the parameters into tensors for each parameter individually
        predt = torch.split(torch.tensor(predt, requires_grad=requires_grad), 1, dim=1)
        # Transform parameters to response scale
        keys = list(self.param_dict.keys())
        max_index = len(self.param_dict) * self.M
        index_ranges = []
        for i in range(0, max_index, self.M):
            if i + self.M >= max_index:
                index_ranges.append((i, None))
                break
            index_ranges.append((i, i + self.M))

        predt_transformed = [
            self.param_dict[key](torch.cat(predt[start:end], dim=1))
            for key, (start, end) in zip(keys, index_ranges)
        ]

    # Specify Distribution and Loss
    dist_fit = self.create_mixture_distribution(predt_transformed)
    loss = -torch.nansum(dist_fit.log_prob(target))

    return predt, loss
loss_fn_start_values(params, target)

Function that calculates the loss for a given set of distributional parameters. Only used for calculating the loss for the start values.

Parameter

params: torch.Tensor Distributional parameters. target: torch.Tensor Target values.

Returns

loss: torch.Tensor Loss value.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
def loss_fn_start_values(self,
                         params: torch.Tensor,
                         target: torch.Tensor) -> torch.Tensor:
    """
    Function that calculates the loss for a given set of distributional parameters. Only used for calculating
    the loss for the start values.

    Parameter
    ---------
    params: torch.Tensor
        Distributional parameters.
    target: torch.Tensor
        Target values.

    Returns
    -------
    loss: torch.Tensor
        Loss value.
    """
    # Replace NaNs and infinity values with 0.5
    nan_inf_idx = torch.isnan(torch.stack(params)) | torch.isinf(torch.stack(params))
    params = torch.where(nan_inf_idx, torch.tensor(0.5), torch.stack(params)).reshape(1, -1)
    params = torch.split(params, self.M, dim=1)

    # Transform parameters to response scale
    params = [response_fn(params[i]) for i, response_fn in enumerate(self.param_dict.values())]

    # Specify Distribution and Loss
    dist = self.create_mixture_distribution(params)
    loss = -torch.nansum(dist.log_prob(target))

    return loss
metric_fn(predt, data)

Function that evaluates the predictions using the negative log-likelihood.

Arguments

predt: np.ndarray Predicted values. data: lgb.Dataset Data used for training.

Returns

name: str Name of the evaluation metric. nll: float Negative log-likelihood. is_higher_better: bool Whether a higher value of the metric is better or not.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def metric_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[str, float, bool]:
    """
    Function that evaluates the predictions using the negative log-likelihood.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: lgb.Dataset
        Data used for training.

    Returns
    -------
    name: str
        Name of the evaluation metric.
    nll: float
        Negative log-likelihood.
    is_higher_better: bool
        Whether a higher value of the metric is better or not.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1), dtype=torch.float32)
    n_obs = target.shape[0]

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate loss
    is_higher_better = False
    _, loss = self.get_params_loss(predt, target.flatten(), start_values, requires_grad=False)

    return self.loss_fn, loss / n_obs, is_higher_better
objective_fn(predt, data)

Function to estimate gradients and hessians of distributional parameters.

Arguments

predt: np.ndarray Predicted values. data: lgb.DMatrix Data used for training.

Returns

grad: np.ndarray Gradient. hess: np.ndarray Hessian.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def objective_fn(self, predt: np.ndarray, data: lgb.Dataset) -> Tuple[np.ndarray, np.ndarray]:

    """
    Function to estimate gradients and hessians of distributional parameters.

    Arguments
    ---------
    predt: np.ndarray
        Predicted values.
    data: lgb.DMatrix
        Data used for training.

    Returns
    -------
    grad: np.ndarray
        Gradient.
    hess: np.ndarray
        Hessian.
    """
    # Target
    target = torch.tensor(data.get_label().reshape(-1, 1), dtype=torch.float32)

    # Weights
    if data.weight is None:
        # Use 1 as weight if no weights are specified
        weights = np.ones_like(target, dtype="float32")
    else:
        weights = data.get_weight().reshape(-1, 1)

    # Start values (needed to replace NaNs in predt)
    start_values = data.get_init_score().reshape(-1, self.n_dist_param)[0, :].tolist()

    # Calculate gradients and hessians
    predt, loss = self.get_params_loss(predt, target.flatten(), start_values, requires_grad=True)
    grad, hess = self.compute_gradients_and_hessians(loss, predt, weights)

    return grad, hess
predict_dist(booster, data, start_values, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

booster : lgb.Booster Trained model. data : pd.DataFrame Data to predict from. start_values : np.ndarray. Starting values for each distributional parameter. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

pred : pd.DataFrame Predictions.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
def predict_dist(self,
                 booster: lgb.Booster,
                 data: pd.DataFrame,
                 start_values: np.ndarray,
                 pred_type: str = "parameters",
                 n_samples: int = 1000,
                 quantiles: list = [0.1, 0.5, 0.9],
                 seed: str = 123
                 ) -> pd.DataFrame:
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    booster : lgb.Booster
        Trained model.
    data : pd.DataFrame
        Data to predict from.
    start_values : np.ndarray.
        Starting values for each distributional parameter.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    pred : pd.DataFrame
        Predictions.
    """
    predt = torch.tensor(
        booster.predict(data, raw_score=True),
        dtype=torch.float32
    ).reshape(-1, self.n_dist_param)

    # Set init_score as starting point for each distributional parameter.
    init_score_pred = torch.tensor(
        np.ones(shape=(data.shape[0], 1))*start_values,
        dtype=torch.float32
    )

    # The predictions don't include the init_score specified in creating the train data.
    # Hence, it needs to be added manually with the corresponding transform for each distributional parameter.
    dist_params_predt = torch.split(
        torch.cat(
            [
                predt[:, i].reshape(-1, 1) + init_score_pred[:, i].reshape(-1, 1)
                for i in range(self.n_dist_param)
            ], axis=1
        ), self.M, dim=1
    )

    dist_params_predt = np.concatenate(
        [
            response_fn(dist_params_predt[i]).numpy()
            for i, response_fn in enumerate(self.param_dict.values())
        ],
        axis=1,
    )

    dist_params_predt = pd.DataFrame(dist_params_predt)
    dist_params_predt.columns = self.distribution_arg_names

    # Draw samples from predicted response distribution
    pred_samples_df = self.draw_samples(predt_params=dist_params_predt,
                                        n_samples=n_samples,
                                        seed=seed)

    if pred_type == "parameters":
        return dist_params_predt

    elif pred_type == "samples":
        return pred_samples_df

    elif pred_type == "quantiles":
        # Calculate quantiles from predicted response distribution
        pred_quant_df = pred_samples_df.quantile(quantiles, axis=1).T
        pred_quant_df.columns = [str("quant_") + str(quantiles[i]) for i in range(len(quantiles))]
        if self.discrete:
            pred_quant_df = pred_quant_df.astype(int)
        return pred_quant_df
stabilize_derivative(input_der, type='MAD')

Function that stabilizes Gradients and Hessians.

As LightGBMLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges, the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution. Another way to improve convergence might be to standardize the response variable. This is especially useful if the range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and the standardization of the response are not always advised but need to be carefully considered. Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

Parameters

input_der : torch.Tensor Input derivative, either Gradient or Hessian. type: str Stabilization method. Can be either "None", "MAD" or "L2".

Returns

stab_der : torch.Tensor Stabilized Gradient or Hessian.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
def stabilize_derivative(self, input_der: torch.Tensor, type: str = "MAD") -> torch.Tensor:
    """
    Function that stabilizes Gradients and Hessians.

    As LightGBMLSS updates the parameter estimates by optimizing Gradients and Hessians, it is important
    that these are comparable in magnitude for all distributional parameters. Due to imbalances regarding the ranges,
    the estimation might become unstable so that it does not converge (or converge very slowly) to the optimal solution.
    Another way to improve convergence might be to standardize the response variable. This is especially useful if the
    range of the response differs strongly from the range of the Gradients and Hessians. Both, the stabilization and
    the standardization of the response are not always advised but need to be carefully considered.
    Source: https://github.com/boost-R/gamboostLSS/blob/7792951d2984f289ed7e530befa42a2a4cb04d1d/R/helpers.R#L173

    Parameters
    ----------
    input_der : torch.Tensor
        Input derivative, either Gradient or Hessian.
    type: str
        Stabilization method. Can be either "None", "MAD" or "L2".

    Returns
    -------
    stab_der : torch.Tensor
        Stabilized Gradient or Hessian.
    """

    if type == "MAD":
        input_der = torch.nan_to_num(input_der,
                                     nan=float(torch.nanmean(input_der)),
                                     posinf=float(torch.nanmean(input_der)),
                                     neginf=float(torch.nanmean(input_der))
                                     )
        div = torch.nanmedian(torch.abs(input_der - torch.nanmedian(input_der)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        stab_der = input_der / div

    if type == "L2":
        input_der = torch.nan_to_num(input_der,
                                     nan=float(torch.nanmean(input_der)),
                                     posinf=float(torch.nanmean(input_der)),
                                     neginf=float(torch.nanmean(input_der))
                                     )
        div = torch.sqrt(torch.nanmean(input_der.pow(2)))
        div = torch.where(div < torch.tensor(1e-04), torch.tensor(1e-04), div)
        div = torch.where(div > torch.tensor(10000.0), torch.tensor(10000.0), div)
        stab_der = input_der / div

    if type == "None":
        stab_der = torch.nan_to_num(input_der,
                                    nan=float(torch.nanmean(input_der)),
                                    posinf=float(torch.nanmean(input_der)),
                                    neginf=float(torch.nanmean(input_der))
                                    )

    return stab_der

get_component_distributions()

Function that returns component distributions for creating a mixing distribution.

Arguments

None

Returns

distns: List List of all available distributions.

Source code in lightgbmlss/distributions/mixture_distribution_utils.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def get_component_distributions():
    """
    Function that returns component distributions for creating a mixing distribution.

    Arguments
    ---------
    None

    Returns
    -------
    distns: List
        List of all available distributions.
    """
    # Get all distribution names
    mixture_distns = [dist for dist in dir(distributions) if dist[0].isupper()]

    # Remove specific distributions
    distns_remove = [
        "Expectile",
        "Mixture",
        "SplineFlow"
    ]

    mixture_distns = [item for item in mixture_distns if item not in distns_remove]

    return mixture_distns

zero_inflated

ZeroAdjustedBeta

Bases: ZeroInflatedDistribution

A Zero-Adjusted Beta distribution.

Parameter

concentration1: torch.Tensor 1st concentration parameter of the distribution (often referred to as alpha). concentration0: torch.Tensor 2nd concentration parameter of the distribution (often referred to as beta). gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

Source code in lightgbmlss/distributions/zero_inflated.py
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
class ZeroAdjustedBeta(ZeroInflatedDistribution):
    """
    A Zero-Adjusted Beta distribution.

    Parameter
    ---------
    concentration1: torch.Tensor
        1st concentration parameter of the distribution (often referred to as alpha).
    concentration0: torch.Tensor
        2nd concentration parameter of the distribution (often referred to as beta).
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py
    """
    arg_constraints = {
        "concentration1": constraints.positive,
        "concentration0": constraints.positive,
        "gate": constraints.unit_interval,
    }
    support = constraints.unit_interval

    def __init__(self, concentration1, concentration0, gate=None, validate_args=None):
        base_dist = Beta(concentration1=concentration1, concentration0=concentration0, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def concentration1(self):
        return self.base_dist.concentration1

    @property
    def concentration0(self):
        return self.base_dist.concentration0

ZeroAdjustedGamma

Bases: ZeroInflatedDistribution

A Zero-Adjusted Gamma distribution.

Parameter

concentration: torch.Tensor shape parameter of the distribution (often referred to as alpha) rate: torch.Tensor rate = 1 / scale of the distribution (often referred to as beta) gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

Source code in lightgbmlss/distributions/zero_inflated.py
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
class ZeroAdjustedGamma(ZeroInflatedDistribution):
    """
    A Zero-Adjusted Gamma distribution.

    Parameter
    ---------
    concentration: torch.Tensor
        shape parameter of the distribution (often referred to as alpha)
    rate: torch.Tensor
        rate = 1 / scale of the distribution (often referred to as beta)
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py
    """
    arg_constraints = {
        "concentration": constraints.positive,
        "rate": constraints.positive,
        "gate": constraints.unit_interval,
    }
    support = constraints.nonnegative

    def __init__(self, concentration, rate, gate=None, validate_args=None):
        base_dist = Gamma(concentration=concentration, rate=rate, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def concentration(self):
        return self.base_dist.concentration

    @property
    def rate(self):
        return self.base_dist.rate

ZeroAdjustedLogNormal

Bases: ZeroInflatedDistribution

A Zero-Adjusted Log-Normal distribution.

Parameter

loc: torch.Tensor Mean of log of distribution. scale: torch.Tensor Standard deviation of log of the distribution. gate: torch.Tensor Probability of zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py

Source code in lightgbmlss/distributions/zero_inflated.py
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
class ZeroAdjustedLogNormal(ZeroInflatedDistribution):
    """
    A Zero-Adjusted Log-Normal distribution.

    Parameter
    ---------
    loc: torch.Tensor
        Mean of log of distribution.
    scale: torch.Tensor
        Standard deviation of log of the distribution.
    gate: torch.Tensor
        Probability of zeros given via a Bernoulli distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py
    """
    arg_constraints = {
        "loc": constraints.real,
        "scale": constraints.positive,
        "gate": constraints.unit_interval,
    }
    support = constraints.nonnegative

    def __init__(self, loc, scale, gate=None, validate_args=None):
        base_dist = LogNormal(loc=loc, scale=scale, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def loc(self):
        return self.base_dist.loc

    @property
    def scale(self):
        return self.base_dist.scale

ZeroInflatedDistribution

Bases: TorchDistribution

Generic Zero Inflated distribution.

This can be used directly or can be used as a base class as e.g. for :class:ZeroInflatedPoisson and :class:ZeroInflatedNegativeBinomial.

Parameters

gate : torch.Tensor Probability of extra zeros given via a Bernoulli distribution. base_dist : torch.distributions.Distribution The base distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L18

Source code in lightgbmlss/distributions/zero_inflated.py
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
class ZeroInflatedDistribution(TorchDistribution):
    """
    Generic Zero Inflated distribution.

    This can be used directly or can be used as a base class as e.g. for
    :class:`ZeroInflatedPoisson` and :class:`ZeroInflatedNegativeBinomial`.

    Parameters
    ----------
    gate : torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.
    base_dist : torch.distributions.Distribution
        The base distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L18
    """

    arg_constraints = {
        "gate": constraints.unit_interval,
        "gate_logits": constraints.real,
    }

    def __init__(self, base_dist, *, gate=None, gate_logits=None, validate_args=None):
        if (gate is None) == (gate_logits is None):
            raise ValueError(
                "Either `gate` or `gate_logits` must be specified, but not both."
            )
        if gate is not None:
            batch_shape = broadcast_shape(gate.shape, base_dist.batch_shape)
            self.gate = gate.expand(batch_shape)
        else:
            batch_shape = broadcast_shape(gate_logits.shape, base_dist.batch_shape)
            self.gate_logits = gate_logits.expand(batch_shape)
        if base_dist.event_shape:
            raise ValueError(
                "ZeroInflatedDistribution expected empty "
                "base_dist.event_shape but got {}".format(base_dist.event_shape)
            )

        self.base_dist = base_dist.expand(batch_shape)
        event_shape = torch.Size()

        super().__init__(batch_shape, event_shape, validate_args)

    @constraints.dependent_property
    def support(self):
        return self.base_dist.support

    @lazy_property
    def gate(self):
        return logits_to_probs(self.gate_logits)

    @lazy_property
    def gate_logits(self):
        return probs_to_logits(self.gate)

    def log_prob(self, value):
        if self._validate_args:
            self._validate_sample(value)

        zero_idx = (value == 0)
        support = self.support
        epsilon = abs(torch.finfo(value.dtype).eps)

        if hasattr(support, "lower_bound"):
            if is_identically_zero(getattr(support, "lower_bound", None)):
                value = value.clamp_min(epsilon)

        if hasattr(support, "upper_bound"):
            if is_identically_one(getattr(support, "upper_bound", None)) & (value.max() == 1.0):
                value = value.clamp_max(1 - epsilon)

        if "gate" in self.__dict__:
            gate, value = broadcast_all(self.gate, value)
            log_prob = (-gate).log1p() + self.base_dist.log_prob(value)
            log_prob = torch.where(zero_idx, (gate + log_prob.exp()).log(), log_prob)
        else:
            gate_logits, value = broadcast_all(self.gate_logits, value)
            log_prob_minus_log_gate = -gate_logits + self.base_dist.log_prob(value)
            log_gate = -softplus(-gate_logits)
            log_prob = log_prob_minus_log_gate + log_gate
            zero_log_prob = softplus(log_prob_minus_log_gate) + log_gate
            log_prob = torch.where(zero_idx, zero_log_prob, log_prob)
        return log_prob

    def sample(self, sample_shape=torch.Size()):
        shape = self._extended_shape(sample_shape)
        with torch.no_grad():
            mask = torch.bernoulli(self.gate.expand(shape)).bool()
            samples = self.base_dist.expand(shape).sample()
            samples = torch.where(mask, samples.new_zeros(()), samples)
        return samples

    @lazy_property
    def mean(self):
        return (1 - self.gate) * self.base_dist.mean

    @lazy_property
    def variance(self):
        return (1 - self.gate) * (
                self.base_dist.mean**2 + self.base_dist.variance
        ) - self.mean**2

    def expand(self, batch_shape, _instance=None):
        new = self._get_checked_instance(type(self), _instance)
        batch_shape = torch.Size(batch_shape)
        gate = self.gate.expand(batch_shape) if "gate" in self.__dict__ else None
        gate_logits = (
            self.gate_logits.expand(batch_shape)
            if "gate_logits" in self.__dict__
            else None
        )
        base_dist = self.base_dist.expand(batch_shape)
        ZeroInflatedDistribution.__init__(
            new, base_dist, gate=gate, gate_logits=gate_logits, validate_args=False
        )
        new._validate_args = self._validate_args
        return new

ZeroInflatedNegativeBinomial

Bases: ZeroInflatedDistribution

A Zero Inflated Negative Binomial distribution.

Parameter

total_count: torch.Tensor Non-negative number of negative Bernoulli trial. probs: torch.Tensor Event probabilities of success in the half open interval [0, 1). logits: torch.Tensor Event log-odds of success (log(p/(1-p))). gate: torch.Tensor Probability of extra zeros given via a Bernoulli distribution.

Source
  • https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150
Source code in lightgbmlss/distributions/zero_inflated.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
class ZeroInflatedNegativeBinomial(ZeroInflatedDistribution):
    """
    A Zero Inflated Negative Binomial distribution.

    Parameter
    ---------
    total_count: torch.Tensor
        Non-negative number of negative Bernoulli trial.
    probs: torch.Tensor
        Event probabilities of success in the half open interval [0, 1).
    logits: torch.Tensor
        Event log-odds of success (log(p/(1-p))).
    gate: torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.

    Source
    ------
    - https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150
    """

    arg_constraints = {
        "total_count": constraints.greater_than_eq(0),
        "probs": constraints.half_open_interval(0.0, 1.0),
        "logits": constraints.real,
        "gate": constraints.unit_interval,
    }
    support = constraints.nonnegative_integer

    def __init__(self, total_count, probs=None, gate=None, validate_args=None):
        base_dist = NegativeBinomial(total_count=total_count, probs=probs, logits=None, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def total_count(self):
        return self.base_dist.total_count

    @property
    def probs(self):
        return self.base_dist.probs

    @property
    def logits(self):
        return self.base_dist.logits

ZeroInflatedPoisson

Bases: ZeroInflatedDistribution

A Zero-Inflated Poisson distribution.

Parameter

rate: torch.Tensor The rate of the Poisson distribution. gate: torch.Tensor Probability of extra zeros given via a Bernoulli distribution.

Source

https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121

Source code in lightgbmlss/distributions/zero_inflated.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
class ZeroInflatedPoisson(ZeroInflatedDistribution):
    """
    A Zero-Inflated Poisson distribution.

    Parameter
    ---------
    rate: torch.Tensor
        The rate of the Poisson distribution.
    gate: torch.Tensor
        Probability of extra zeros given via a Bernoulli distribution.

    Source
    ------
    https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121
    """
    arg_constraints = {
        "rate": constraints.positive,
        "gate": constraints.unit_interval,
    }
    support = constraints.nonnegative_integer

    def __init__(self, rate, gate=None, validate_args=None):
        base_dist = Poisson(rate=rate, validate_args=False)
        base_dist._validate_args = validate_args

        super().__init__(base_dist, gate=gate, validate_args=validate_args)

    @property
    def rate(self):
        return self.base_dist.rate

model

LightGBMLSS

LightGBMLSS model class

Parameters

dist : Distribution DistributionClass object. start_values : np.ndarray Starting values for each distributional parameter.

Source code in lightgbmlss/model.py
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
class LightGBMLSS:
    """
    LightGBMLSS model class

    Parameters
    ----------
    dist : Distribution
        DistributionClass object.
     start_values : np.ndarray
        Starting values for each distributional parameter.
    """
    def __init__(self, dist: DistributionClass):
        self.dist = dist             # Distribution object
        self.start_values = None     # Starting values for distributional parameters

    def set_params(self, params: Dict[str, Any]) -> Dict[str, Any]:
        """
        Set parameters for distributional model.

        Arguments
        ---------
        params : Dict[str, Any]
            Parameters for model.

        Returns
        -------
        params : Dict[str, Any]
            Updated Parameters for model.
        """
        params_adj = {"num_class": self.dist.n_dist_param,
                      "metric": "None",
                      "objective": self.dist.objective_fn,
                      "random_seed": 123,
                      "verbose": -1
                      }
        params.update(params_adj)

        return params

    def set_init_score(self, dmatrix: Dataset) -> None:
        """
        Set init_score for distributions.

        Arguments
        ---------
        dmatrix : Dataset
            Dataset to set base margin for.

        Returns
        -------
        None
        """
        if self.start_values is None:
            _, self.start_values = self.dist.calculate_start_values(dmatrix.get_label())
        init_score = (np.ones(shape=(dmatrix.get_label().shape[0], 1))) * self.start_values
        dmatrix.set_init_score(init_score.ravel(order="F"))

    def train(self,
              params: Dict[str, Any],
              train_set: Dataset,
              num_boost_round: int = 100,
              valid_sets: Optional[List[Dataset]] = None,
              valid_names: Optional[List[str]] = None,
              init_model: Optional[Union[str, Path, Booster]] = None,
              feature_name: _LGBM_FeatureNameConfiguration = 'auto',
              categorical_feature: _LGBM_CategoricalFeatureConfiguration = 'auto',
              keep_training_booster: bool = False,
              callbacks: Optional[List[Callable]] = None
              ) -> Booster:
        """Function to perform the training of a LightGBMLSS model with given parameters.

        Parameters
        ----------
        params : dict
            Parameters for training. Values passed through ``params`` take precedence over those
            supplied via arguments.
        train_set : Dataset
            Data to be trained on.
        num_boost_round : int, optional (default=100)
            Number of boosting iterations.
        valid_sets : list of Dataset, or None, optional (default=None)
            List of data to be evaluated on during training.
        valid_names : list of str, or None, optional (default=None)
            Names of ``valid_sets``.
        init_model : str, pathlib.Path, Booster or None, optional (default=None)
            Filename of LightGBM model or Booster instance used for continue training.
        feature_name : list of str, or 'auto', optional (default="auto")
            Feature names.
            If 'auto' and data is pandas DataFrame, data columns names are used.
        categorical_feature : list of str or int, or 'auto', optional (default="auto")
            Categorical features.
            If list of int, interpreted as indices.
            If list of str, interpreted as feature names (need to specify ``feature_name`` as well).
            If 'auto' and data is pandas DataFrame, pandas unordered categorical columns are used.
            All values in categorical features will be cast to int32 and thus should be less than int32 max value (2147483647).
            Large values could be memory consuming. Consider using consecutive integers starting from zero.
            All negative values in categorical features will be treated as missing values.
            The output cannot be monotonically constrained with respect to a categorical feature.
            Floating point numbers in categorical features will be rounded towards 0.
        keep_training_booster : bool, optional (default=False)
            Whether the returned Booster will be used to keep training.
            If False, the returned value will be converted into _InnerPredictor before returning.
            This means you won't be able to use ``eval``, ``eval_train`` or ``eval_valid`` methods of the returned Booster.
            When your model is very large and cause the memory error,
            you can try to set this param to ``True`` to avoid the model conversion performed during the internal call of ``model_to_string``.
            You can still use _InnerPredictor as ``init_model`` for future continue training.
        callbacks : list of callable, or None, optional (default=None)
            List of callback functions that are applied at each iteration.
            See Callbacks in Python API for more information.

        Returns
        -------
        booster : Booster
            The trained Booster model.
        """
        self.set_params(params)
        self.set_init_score(train_set)

        if valid_sets is not None:
            valid_sets = self.set_valid_margin(valid_sets)

        self.booster = lgb.train(params,
                                 train_set,
                                 num_boost_round=num_boost_round,
                                 feval=self.dist.metric_fn,
                                 valid_sets=valid_sets,
                                 valid_names=valid_names,
                                 init_model=init_model,
                                 feature_name=feature_name,
                                 categorical_feature=categorical_feature,
                                 keep_training_booster=keep_training_booster,
                                 callbacks=callbacks)

    def cv(self,
           params: Dict[str, Any],
           train_set: Dataset,
           num_boost_round: int = 100,
           folds: Optional[Union[Iterable[Tuple[np.ndarray, np.ndarray]], _LGBMBaseCrossValidator]] = None,
           nfold: int = 5,
           stratified: bool = True,
           shuffle: bool = True,
           init_model: Optional[Union[str, Path, Booster]] = None,
           feature_name: _LGBM_FeatureNameConfiguration = 'auto',
           categorical_feature: _LGBM_CategoricalFeatureConfiguration = 'auto',
           fpreproc: Optional[_LGBM_PreprocFunction] = None,
           seed: int = 123,
           callbacks: Optional[List[Callable]] = None,
           eval_train_metric: bool = False,
           return_cvbooster: bool = False
           ) -> Dict[str, Union[List[float], CVBooster]]:
        """Function to cross-validate a LightGBMLSS model with given parameters.

        Parameters
        ----------
        params : dict
            Parameters for training. Values passed through ``params`` take precedence over those
            supplied via arguments.
        train_set : Dataset
            Data to be trained on.
        num_boost_round : int, optional (default=100)
            Number of boosting iterations.
        folds : generator or iterator of (train_idx, test_idx) tuples, scikit-learn splitter object or None, optional (default=None)
            If generator or iterator, it should yield the train and test indices for each fold.
            If object, it should be one of the scikit-learn splitter classes
            (https://scikit-learn.org/stable/modules/classes.html#splitter-classes)
            and have ``split`` method.
            This argument has highest priority over other data split arguments.
        nfold : int, optional (default=5)
            Number of folds in CV.
        stratified : bool, optional (default=True)
            Whether to perform stratified sampling.
        shuffle : bool, optional (default=True)
            Whether to shuffle before splitting data.
        init_model : str, pathlib.Path, Booster or None, optional (default=None)
            Filename of LightGBM model or Booster instance used for continue training.
        feature_name : list of str, or 'auto', optional (default="auto")
            Feature names.
            If 'auto' and data is pandas DataFrame, data columns names are used.
        categorical_feature : list of str or int, or 'auto', optional (default="auto")
            Categorical features.
            If list of int, interpreted as indices.
            If list of str, interpreted as feature names (need to specify ``feature_name`` as well).
            If 'auto' and data is pandas DataFrame, pandas unordered categorical columns are used.
            All values in categorical features will be cast to int32 and thus should be less than int32 max value (2147483647).
            Large values could be memory consuming. Consider using consecutive integers starting from zero.
            All negative values in categorical features will be treated as missing values.
            The output cannot be monotonically constrained with respect to a categorical feature.
            Floating point numbers in categorical features will be rounded towards 0.
        fpreproc : callable or None, optional (default=None)
            Preprocessing function that takes (dtrain, dtest, params)
            and returns transformed versions of those.
        seed : int, optional (default=0)
            Seed used to generate the folds (passed to numpy.random.seed).
        callbacks : list of callable, or None, optional (default=None)
            List of callback functions that are applied at each iteration.
            See Callbacks in Python API for more information.
        eval_train_metric : bool, optional (default=False)
            Whether to display the train metric in progress.
            The score of the metric is calculated again after each training step, so there is some impact on performance.
        return_cvbooster : bool, optional (default=False)
            Whether to return Booster models trained on each fold through ``CVBooster``.

        Returns
        -------
        eval_hist : dict
            Evaluation history.
            The dictionary has the following format:
            {'metric1-mean': [values], 'metric1-stdv': [values],
            'metric2-mean': [values], 'metric2-stdv': [values],
            ...}.
            If ``return_cvbooster=True``, also returns trained boosters wrapped in a ``CVBooster`` object via ``cvbooster`` key.
        """
        self.set_params(params)
        self.set_init_score(train_set)

        self.bstLSS_cv = lgb.cv(params,
                                train_set,
                                feval=self.dist.metric_fn,
                                num_boost_round=num_boost_round,
                                folds=folds,
                                nfold=nfold,
                                stratified=False,
                                shuffle=False,
                                metrics=None,
                                init_model=init_model,
                                feature_name=feature_name,
                                categorical_feature=categorical_feature,
                                fpreproc=fpreproc,
                                seed=seed,
                                callbacks=callbacks,
                                eval_train_metric=eval_train_metric,
                                return_cvbooster=return_cvbooster)

        return self.bstLSS_cv

    def hyper_opt(
            self,
            hp_dict: Dict,
            train_set: lgb.Dataset,
            num_boost_round=500,
            nfold=10,
            early_stopping_rounds=20,
            max_minutes=10,
            n_trials=None,
            study_name=None,
            silence=False,
            seed=None,
            hp_seed=None
    ):
        """
        Function to tune hyperparameters using optuna.

        Arguments
        ----------
        hp_dict: dict
            Dictionary of hyperparameters to tune.
        train_set: lgb.Dataset
            Training data.
        num_boost_round: int
            Number of boosting iterations.
        nfold: int
            Number of folds in CV.
        early_stopping_rounds: int
            Activates early stopping. Cross-Validation metric (average of validation
            metric computed over CV folds) needs to improve at least once in
            every **early_stopping_rounds** round(s) to continue training.
            The last entry in the evaluation history will represent the best iteration.
            If there's more than one metric in the **eval_metric** parameter given in
            **params**, the last metric will be used for early stopping.
        max_minutes: int
            Time budget in minutes, i.e., stop study after the given number of minutes.
        n_trials: int
            The number of trials. If this argument is set to None, there is no limitation on the number of trials.
        study_name: str
            Name of the hyperparameter study.
        silence: bool
            Controls the verbosity of the trail, i.e., user can silence the outputs of the trail.
        seed: int
            Seed used to generate the folds (passed to numpy.random.seed).
        hp_seed: int
            Seed for random number generator used in the Bayesian hyper-parameter search.

        Returns
        -------
        opt_params : dict
            Optimal hyper-parameters.
        """

        def objective(trial):

            hyper_params = {}

            for param_name, param_value in hp_dict.items():

                param_type = param_value[0]

                if param_type == "categorical" or param_type == "none":
                    hyper_params.update({param_name: trial.suggest_categorical(param_name, param_value[1])})

                elif param_type == "float":
                    param_constraints = param_value[1]
                    param_low = param_constraints["low"]
                    param_high = param_constraints["high"]
                    param_log = param_constraints["log"]
                    hyper_params.update(
                        {param_name: trial.suggest_float(param_name,
                                                         low=param_low,
                                                         high=param_high,
                                                         log=param_log
                                                         )
                         })

                elif param_type == "int":
                    param_constraints = param_value[1]
                    param_low = param_constraints["low"]
                    param_high = param_constraints["high"]
                    param_log = param_constraints["log"]
                    hyper_params.update(
                        {param_name: trial.suggest_int(param_name,
                                                       low=param_low,
                                                       high=param_high,
                                                       log=param_log
                                                       )
                         })

            # Add booster if not included in dictionary
            if "boosting" not in hyper_params.keys():
                hyper_params.update({"boosting": trial.suggest_categorical("boosting", ["gbdt"])})

            # Add pruning and early stopping
            pruning_callback = LightGBMPruningCallback(trial, self.dist.loss_fn)
            early_stopping_callback = lgb.early_stopping(stopping_rounds=early_stopping_rounds, verbose=False)

            lgblss_param_tuning = self.cv(hyper_params,
                                          train_set,
                                          num_boost_round=num_boost_round,
                                          nfold=nfold,
                                          callbacks=[pruning_callback, early_stopping_callback],
                                          seed=seed,
                                          )

            # Extract the optimal number of boosting rounds
            opt_rounds = np.argmin(np.array(lgblss_param_tuning[f"valid {self.dist.loss_fn}-mean"])) + 1
            trial.set_user_attr("opt_round", int(opt_rounds))

            # Extract the best score
            best_score = np.min(np.array(lgblss_param_tuning[f"valid {self.dist.loss_fn}-mean"]))

            return best_score

        if study_name is None:
            study_name = "LightGBMLSS Hyper-Parameter Optimization"

        if silence:
            optuna.logging.set_verbosity(optuna.logging.WARNING)

        if hp_seed is not None:
            sampler = TPESampler(seed=hp_seed)
        else:
            sampler = TPESampler()

        pruner = optuna.pruners.MedianPruner(n_startup_trials=10, n_warmup_steps=20)
        study = optuna.create_study(sampler=sampler, pruner=pruner, direction="minimize", study_name=study_name)
        study.optimize(objective, n_trials=n_trials, timeout=60 * max_minutes, show_progress_bar=True)

        print("\nHyper-Parameter Optimization successfully finished.")
        print("  Number of finished trials: ", len(study.trials))
        print("  Best trial:")
        opt_param = study.best_trial

        # Add optimal stopping round
        opt_param.params["opt_rounds"] = study.trials_dataframe()["user_attrs_opt_round"][
            study.trials_dataframe()["value"].idxmin()]
        opt_param.params["opt_rounds"] = int(opt_param.params["opt_rounds"])

        print("    Value: {}".format(opt_param.value))
        print("    Params: ")
        for key, value in opt_param.params.items():
            print("    {}: {}".format(key, value))

        return opt_param.params

    def predict(self,
                data: pd.DataFrame,
                pred_type: str = "parameters",
                n_samples: int = 1000,
                quantiles: list = [0.1, 0.5, 0.9],
                seed: str = 123):
        """
        Function that predicts from the trained model.

        Arguments
        ---------
        data : pd.DataFrame
            Data to predict from.
        pred_type : str
            Type of prediction:
            - "samples" draws n_samples from the predicted distribution.
            - "quantiles" calculates the quantiles from the predicted distribution.
            - "parameters" returns the predicted distributional parameters.
            - "expectiles" returns the predicted expectiles.
        n_samples : int
            Number of samples to draw from the predicted distribution.
        quantiles : List[float]
            List of quantiles to calculate from the predicted distribution.
        seed : int
            Seed for random number generator used to draw samples from the predicted distribution.

        Returns
        -------
        predt_df : pd.DataFrame
            Predictions.
        """

        # Predict
        predt_df = self.dist.predict_dist(booster=self.booster,
                                          data=data,
                                          start_values=self.start_values,
                                          pred_type=pred_type,
                                          n_samples=n_samples,
                                          quantiles=quantiles,
                                          seed=seed)

        return predt_df

    def plot(self,
             X: pd.DataFrame,
             feature: str = "x",
             parameter: str = "loc",
             max_display: int = 15,
             plot_type: str = "Partial_Dependence"):
        """
        LightGBMLSS SHap plotting function.

        Arguments:
        ---------
        X: pd.DataFrame
            Train/Test Data
        feature: str
            Specifies which feature is to be plotted.
        parameter: str
            Specifies which distributional parameter is to be plotted.
        max_display: int
            Specifies the maximum number of features to be displayed.
        plot_type: str
            Specifies the type of plot:
                "Partial_Dependence" plots the partial dependence of the parameter on the feature.
                "Feature_Importance" plots the feature importance of the parameter.
        """
        shap.initjs()
        explainer = shap.TreeExplainer(self.booster)
        shap_values = explainer(X)

        param_pos = self.dist.distribution_arg_names.index(parameter)

        if plot_type == "Partial_Dependence":
            if self.dist.n_dist_param == 1:
                shap.plots.scatter(shap_values[:, feature], color=shap_values[:, feature])
            else:
                shap.plots.scatter(shap_values[:, feature][:, param_pos], color=shap_values[:, feature][:, param_pos])
        elif plot_type == "Feature_Importance":
            if self.dist.n_dist_param == 1:
                shap.plots.bar(shap_values, max_display=max_display if X.shape[1] > max_display else X.shape[1])
            else:
                shap.plots.bar(
                    shap_values[:, :, param_pos], max_display=max_display if X.shape[1] > max_display else X.shape[1]
                )

    def expectile_plot(self,
                       X: pd.DataFrame,
                       feature: str = "x",
                       expectile: str = "0.05",
                       plot_type: str = "Partial_Dependence"):
        """
        LightGBMLSS function for plotting expectile SHapley values.

        X: pd.DataFrame
            Train/Test Data
        feature: str
            Specifies which feature to use for plotting Partial_Dependence plot.
        expectile: str
            Specifies which expectile to plot.
        plot_type: str
            Specifies which SHapley-plot to visualize. Currently, "Partial_Dependence" and "Feature_Importance"
            are supported.
        """

        shap.initjs()
        explainer = shap.TreeExplainer(self.booster)
        shap_values = explainer(X)

        expect_pos = list(self.dist.param_dict.keys()).index(expectile)

        if plot_type == "Partial_Dependence":
            shap.plots.scatter(shap_values[:, feature][:, expect_pos], color=shap_values[:, feature][:, expect_pos])
        elif plot_type == "Feature_Importance":
            shap.plots.bar(shap_values[:, :, expect_pos], max_display=15 if X.shape[1] > 15 else X.shape[1])

    def set_valid_margin(self,
                         valid_sets: list,
                         ) -> list:
        """
        Function that sets the base margin for the validation set.

        Arguments
        ---------
        valid_sets : list
            List of tuples containing the evaluation set(s).

        Returns
        -------
        valid_sets : list
            List of tuples containing the evaluation set(s).
        """
        for valid_set in valid_sets:
            self.set_init_score(valid_set)

        return valid_sets

    def save_model(self,
                   model_path: str
                   ) -> None:
        """
        Save the model to a file.

        Parameters
        ----------
        model_path : str
            The path to save the model.

        Returns
        -------
        None
        """
        with open(model_path, "wb") as f:
            pickle.dump(self, f)

    @staticmethod
    def load_model(model_path: str):
        """
        Load the model from a file.

        Parameters
        ----------
        model_path : str
            The path to the saved model.

        Returns
        -------
        The loaded model.
        """
        with open(model_path, "rb") as f:
            return pickle.load(f)

cv(params, train_set, num_boost_round=100, folds=None, nfold=5, stratified=True, shuffle=True, init_model=None, feature_name='auto', categorical_feature='auto', fpreproc=None, seed=123, callbacks=None, eval_train_metric=False, return_cvbooster=False)

Function to cross-validate a LightGBMLSS model with given parameters.

Parameters

params : dict Parameters for training. Values passed through params take precedence over those supplied via arguments. train_set : Dataset Data to be trained on. num_boost_round : int, optional (default=100) Number of boosting iterations. folds : generator or iterator of (train_idx, test_idx) tuples, scikit-learn splitter object or None, optional (default=None) If generator or iterator, it should yield the train and test indices for each fold. If object, it should be one of the scikit-learn splitter classes (https://scikit-learn.org/stable/modules/classes.html#splitter-classes) and have split method. This argument has highest priority over other data split arguments. nfold : int, optional (default=5) Number of folds in CV. stratified : bool, optional (default=True) Whether to perform stratified sampling. shuffle : bool, optional (default=True) Whether to shuffle before splitting data. init_model : str, pathlib.Path, Booster or None, optional (default=None) Filename of LightGBM model or Booster instance used for continue training. feature_name : list of str, or 'auto', optional (default="auto") Feature names. If 'auto' and data is pandas DataFrame, data columns names are used. categorical_feature : list of str or int, or 'auto', optional (default="auto") Categorical features. If list of int, interpreted as indices. If list of str, interpreted as feature names (need to specify feature_name as well). If 'auto' and data is pandas DataFrame, pandas unordered categorical columns are used. All values in categorical features will be cast to int32 and thus should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. The output cannot be monotonically constrained with respect to a categorical feature. Floating point numbers in categorical features will be rounded towards 0. fpreproc : callable or None, optional (default=None) Preprocessing function that takes (dtrain, dtest, params) and returns transformed versions of those. seed : int, optional (default=0) Seed used to generate the folds (passed to numpy.random.seed). callbacks : list of callable, or None, optional (default=None) List of callback functions that are applied at each iteration. See Callbacks in Python API for more information. eval_train_metric : bool, optional (default=False) Whether to display the train metric in progress. The score of the metric is calculated again after each training step, so there is some impact on performance. return_cvbooster : bool, optional (default=False) Whether to return Booster models trained on each fold through CVBooster.

Returns

eval_hist : dict Evaluation history. The dictionary has the following format: {'metric1-mean': [values], 'metric1-stdv': [values], 'metric2-mean': [values], 'metric2-stdv': [values], ...}. If return_cvbooster=True, also returns trained boosters wrapped in a CVBooster object via cvbooster key.

Source code in lightgbmlss/model.py
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
def cv(self,
       params: Dict[str, Any],
       train_set: Dataset,
       num_boost_round: int = 100,
       folds: Optional[Union[Iterable[Tuple[np.ndarray, np.ndarray]], _LGBMBaseCrossValidator]] = None,
       nfold: int = 5,
       stratified: bool = True,
       shuffle: bool = True,
       init_model: Optional[Union[str, Path, Booster]] = None,
       feature_name: _LGBM_FeatureNameConfiguration = 'auto',
       categorical_feature: _LGBM_CategoricalFeatureConfiguration = 'auto',
       fpreproc: Optional[_LGBM_PreprocFunction] = None,
       seed: int = 123,
       callbacks: Optional[List[Callable]] = None,
       eval_train_metric: bool = False,
       return_cvbooster: bool = False
       ) -> Dict[str, Union[List[float], CVBooster]]:
    """Function to cross-validate a LightGBMLSS model with given parameters.

    Parameters
    ----------
    params : dict
        Parameters for training. Values passed through ``params`` take precedence over those
        supplied via arguments.
    train_set : Dataset
        Data to be trained on.
    num_boost_round : int, optional (default=100)
        Number of boosting iterations.
    folds : generator or iterator of (train_idx, test_idx) tuples, scikit-learn splitter object or None, optional (default=None)
        If generator or iterator, it should yield the train and test indices for each fold.
        If object, it should be one of the scikit-learn splitter classes
        (https://scikit-learn.org/stable/modules/classes.html#splitter-classes)
        and have ``split`` method.
        This argument has highest priority over other data split arguments.
    nfold : int, optional (default=5)
        Number of folds in CV.
    stratified : bool, optional (default=True)
        Whether to perform stratified sampling.
    shuffle : bool, optional (default=True)
        Whether to shuffle before splitting data.
    init_model : str, pathlib.Path, Booster or None, optional (default=None)
        Filename of LightGBM model or Booster instance used for continue training.
    feature_name : list of str, or 'auto', optional (default="auto")
        Feature names.
        If 'auto' and data is pandas DataFrame, data columns names are used.
    categorical_feature : list of str or int, or 'auto', optional (default="auto")
        Categorical features.
        If list of int, interpreted as indices.
        If list of str, interpreted as feature names (need to specify ``feature_name`` as well).
        If 'auto' and data is pandas DataFrame, pandas unordered categorical columns are used.
        All values in categorical features will be cast to int32 and thus should be less than int32 max value (2147483647).
        Large values could be memory consuming. Consider using consecutive integers starting from zero.
        All negative values in categorical features will be treated as missing values.
        The output cannot be monotonically constrained with respect to a categorical feature.
        Floating point numbers in categorical features will be rounded towards 0.
    fpreproc : callable or None, optional (default=None)
        Preprocessing function that takes (dtrain, dtest, params)
        and returns transformed versions of those.
    seed : int, optional (default=0)
        Seed used to generate the folds (passed to numpy.random.seed).
    callbacks : list of callable, or None, optional (default=None)
        List of callback functions that are applied at each iteration.
        See Callbacks in Python API for more information.
    eval_train_metric : bool, optional (default=False)
        Whether to display the train metric in progress.
        The score of the metric is calculated again after each training step, so there is some impact on performance.
    return_cvbooster : bool, optional (default=False)
        Whether to return Booster models trained on each fold through ``CVBooster``.

    Returns
    -------
    eval_hist : dict
        Evaluation history.
        The dictionary has the following format:
        {'metric1-mean': [values], 'metric1-stdv': [values],
        'metric2-mean': [values], 'metric2-stdv': [values],
        ...}.
        If ``return_cvbooster=True``, also returns trained boosters wrapped in a ``CVBooster`` object via ``cvbooster`` key.
    """
    self.set_params(params)
    self.set_init_score(train_set)

    self.bstLSS_cv = lgb.cv(params,
                            train_set,
                            feval=self.dist.metric_fn,
                            num_boost_round=num_boost_round,
                            folds=folds,
                            nfold=nfold,
                            stratified=False,
                            shuffle=False,
                            metrics=None,
                            init_model=init_model,
                            feature_name=feature_name,
                            categorical_feature=categorical_feature,
                            fpreproc=fpreproc,
                            seed=seed,
                            callbacks=callbacks,
                            eval_train_metric=eval_train_metric,
                            return_cvbooster=return_cvbooster)

    return self.bstLSS_cv

expectile_plot(X, feature='x', expectile='0.05', plot_type='Partial_Dependence')

LightGBMLSS function for plotting expectile SHapley values.

pd.DataFrame

Train/Test Data

feature: str Specifies which feature to use for plotting Partial_Dependence plot. expectile: str Specifies which expectile to plot. plot_type: str Specifies which SHapley-plot to visualize. Currently, "Partial_Dependence" and "Feature_Importance" are supported.

Source code in lightgbmlss/model.py
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
def expectile_plot(self,
                   X: pd.DataFrame,
                   feature: str = "x",
                   expectile: str = "0.05",
                   plot_type: str = "Partial_Dependence"):
    """
    LightGBMLSS function for plotting expectile SHapley values.

    X: pd.DataFrame
        Train/Test Data
    feature: str
        Specifies which feature to use for plotting Partial_Dependence plot.
    expectile: str
        Specifies which expectile to plot.
    plot_type: str
        Specifies which SHapley-plot to visualize. Currently, "Partial_Dependence" and "Feature_Importance"
        are supported.
    """

    shap.initjs()
    explainer = shap.TreeExplainer(self.booster)
    shap_values = explainer(X)

    expect_pos = list(self.dist.param_dict.keys()).index(expectile)

    if plot_type == "Partial_Dependence":
        shap.plots.scatter(shap_values[:, feature][:, expect_pos], color=shap_values[:, feature][:, expect_pos])
    elif plot_type == "Feature_Importance":
        shap.plots.bar(shap_values[:, :, expect_pos], max_display=15 if X.shape[1] > 15 else X.shape[1])

hyper_opt(hp_dict, train_set, num_boost_round=500, nfold=10, early_stopping_rounds=20, max_minutes=10, n_trials=None, study_name=None, silence=False, seed=None, hp_seed=None)

Function to tune hyperparameters using optuna.

Arguments

hp_dict: dict Dictionary of hyperparameters to tune. train_set: lgb.Dataset Training data. num_boost_round: int Number of boosting iterations. nfold: int Number of folds in CV. early_stopping_rounds: int Activates early stopping. Cross-Validation metric (average of validation metric computed over CV folds) needs to improve at least once in every early_stopping_rounds round(s) to continue training. The last entry in the evaluation history will represent the best iteration. If there's more than one metric in the eval_metric parameter given in params, the last metric will be used for early stopping. max_minutes: int Time budget in minutes, i.e., stop study after the given number of minutes. n_trials: int The number of trials. If this argument is set to None, there is no limitation on the number of trials. study_name: str Name of the hyperparameter study. silence: bool Controls the verbosity of the trail, i.e., user can silence the outputs of the trail. seed: int Seed used to generate the folds (passed to numpy.random.seed). hp_seed: int Seed for random number generator used in the Bayesian hyper-parameter search.

Returns

opt_params : dict Optimal hyper-parameters.

Source code in lightgbmlss/model.py
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
def hyper_opt(
        self,
        hp_dict: Dict,
        train_set: lgb.Dataset,
        num_boost_round=500,
        nfold=10,
        early_stopping_rounds=20,
        max_minutes=10,
        n_trials=None,
        study_name=None,
        silence=False,
        seed=None,
        hp_seed=None
):
    """
    Function to tune hyperparameters using optuna.

    Arguments
    ----------
    hp_dict: dict
        Dictionary of hyperparameters to tune.
    train_set: lgb.Dataset
        Training data.
    num_boost_round: int
        Number of boosting iterations.
    nfold: int
        Number of folds in CV.
    early_stopping_rounds: int
        Activates early stopping. Cross-Validation metric (average of validation
        metric computed over CV folds) needs to improve at least once in
        every **early_stopping_rounds** round(s) to continue training.
        The last entry in the evaluation history will represent the best iteration.
        If there's more than one metric in the **eval_metric** parameter given in
        **params**, the last metric will be used for early stopping.
    max_minutes: int
        Time budget in minutes, i.e., stop study after the given number of minutes.
    n_trials: int
        The number of trials. If this argument is set to None, there is no limitation on the number of trials.
    study_name: str
        Name of the hyperparameter study.
    silence: bool
        Controls the verbosity of the trail, i.e., user can silence the outputs of the trail.
    seed: int
        Seed used to generate the folds (passed to numpy.random.seed).
    hp_seed: int
        Seed for random number generator used in the Bayesian hyper-parameter search.

    Returns
    -------
    opt_params : dict
        Optimal hyper-parameters.
    """

    def objective(trial):

        hyper_params = {}

        for param_name, param_value in hp_dict.items():

            param_type = param_value[0]

            if param_type == "categorical" or param_type == "none":
                hyper_params.update({param_name: trial.suggest_categorical(param_name, param_value[1])})

            elif param_type == "float":
                param_constraints = param_value[1]
                param_low = param_constraints["low"]
                param_high = param_constraints["high"]
                param_log = param_constraints["log"]
                hyper_params.update(
                    {param_name: trial.suggest_float(param_name,
                                                     low=param_low,
                                                     high=param_high,
                                                     log=param_log
                                                     )
                     })

            elif param_type == "int":
                param_constraints = param_value[1]
                param_low = param_constraints["low"]
                param_high = param_constraints["high"]
                param_log = param_constraints["log"]
                hyper_params.update(
                    {param_name: trial.suggest_int(param_name,
                                                   low=param_low,
                                                   high=param_high,
                                                   log=param_log
                                                   )
                     })

        # Add booster if not included in dictionary
        if "boosting" not in hyper_params.keys():
            hyper_params.update({"boosting": trial.suggest_categorical("boosting", ["gbdt"])})

        # Add pruning and early stopping
        pruning_callback = LightGBMPruningCallback(trial, self.dist.loss_fn)
        early_stopping_callback = lgb.early_stopping(stopping_rounds=early_stopping_rounds, verbose=False)

        lgblss_param_tuning = self.cv(hyper_params,
                                      train_set,
                                      num_boost_round=num_boost_round,
                                      nfold=nfold,
                                      callbacks=[pruning_callback, early_stopping_callback],
                                      seed=seed,
                                      )

        # Extract the optimal number of boosting rounds
        opt_rounds = np.argmin(np.array(lgblss_param_tuning[f"valid {self.dist.loss_fn}-mean"])) + 1
        trial.set_user_attr("opt_round", int(opt_rounds))

        # Extract the best score
        best_score = np.min(np.array(lgblss_param_tuning[f"valid {self.dist.loss_fn}-mean"]))

        return best_score

    if study_name is None:
        study_name = "LightGBMLSS Hyper-Parameter Optimization"

    if silence:
        optuna.logging.set_verbosity(optuna.logging.WARNING)

    if hp_seed is not None:
        sampler = TPESampler(seed=hp_seed)
    else:
        sampler = TPESampler()

    pruner = optuna.pruners.MedianPruner(n_startup_trials=10, n_warmup_steps=20)
    study = optuna.create_study(sampler=sampler, pruner=pruner, direction="minimize", study_name=study_name)
    study.optimize(objective, n_trials=n_trials, timeout=60 * max_minutes, show_progress_bar=True)

    print("\nHyper-Parameter Optimization successfully finished.")
    print("  Number of finished trials: ", len(study.trials))
    print("  Best trial:")
    opt_param = study.best_trial

    # Add optimal stopping round
    opt_param.params["opt_rounds"] = study.trials_dataframe()["user_attrs_opt_round"][
        study.trials_dataframe()["value"].idxmin()]
    opt_param.params["opt_rounds"] = int(opt_param.params["opt_rounds"])

    print("    Value: {}".format(opt_param.value))
    print("    Params: ")
    for key, value in opt_param.params.items():
        print("    {}: {}".format(key, value))

    return opt_param.params

load_model(model_path) staticmethod

Load the model from a file.

Parameters

model_path : str The path to the saved model.

Returns

The loaded model.

Source code in lightgbmlss/model.py
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
@staticmethod
def load_model(model_path: str):
    """
    Load the model from a file.

    Parameters
    ----------
    model_path : str
        The path to the saved model.

    Returns
    -------
    The loaded model.
    """
    with open(model_path, "rb") as f:
        return pickle.load(f)

plot(X, feature='x', parameter='loc', max_display=15, plot_type='Partial_Dependence')

LightGBMLSS SHap plotting function.

Arguments:

X: pd.DataFrame Train/Test Data feature: str Specifies which feature is to be plotted. parameter: str Specifies which distributional parameter is to be plotted. max_display: int Specifies the maximum number of features to be displayed. plot_type: str Specifies the type of plot: "Partial_Dependence" plots the partial dependence of the parameter on the feature. "Feature_Importance" plots the feature importance of the parameter.

Source code in lightgbmlss/model.py
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
def plot(self,
         X: pd.DataFrame,
         feature: str = "x",
         parameter: str = "loc",
         max_display: int = 15,
         plot_type: str = "Partial_Dependence"):
    """
    LightGBMLSS SHap plotting function.

    Arguments:
    ---------
    X: pd.DataFrame
        Train/Test Data
    feature: str
        Specifies which feature is to be plotted.
    parameter: str
        Specifies which distributional parameter is to be plotted.
    max_display: int
        Specifies the maximum number of features to be displayed.
    plot_type: str
        Specifies the type of plot:
            "Partial_Dependence" plots the partial dependence of the parameter on the feature.
            "Feature_Importance" plots the feature importance of the parameter.
    """
    shap.initjs()
    explainer = shap.TreeExplainer(self.booster)
    shap_values = explainer(X)

    param_pos = self.dist.distribution_arg_names.index(parameter)

    if plot_type == "Partial_Dependence":
        if self.dist.n_dist_param == 1:
            shap.plots.scatter(shap_values[:, feature], color=shap_values[:, feature])
        else:
            shap.plots.scatter(shap_values[:, feature][:, param_pos], color=shap_values[:, feature][:, param_pos])
    elif plot_type == "Feature_Importance":
        if self.dist.n_dist_param == 1:
            shap.plots.bar(shap_values, max_display=max_display if X.shape[1] > max_display else X.shape[1])
        else:
            shap.plots.bar(
                shap_values[:, :, param_pos], max_display=max_display if X.shape[1] > max_display else X.shape[1]
            )

predict(data, pred_type='parameters', n_samples=1000, quantiles=[0.1, 0.5, 0.9], seed=123)

Function that predicts from the trained model.

Arguments

data : pd.DataFrame Data to predict from. pred_type : str Type of prediction: - "samples" draws n_samples from the predicted distribution. - "quantiles" calculates the quantiles from the predicted distribution. - "parameters" returns the predicted distributional parameters. - "expectiles" returns the predicted expectiles. n_samples : int Number of samples to draw from the predicted distribution. quantiles : List[float] List of quantiles to calculate from the predicted distribution. seed : int Seed for random number generator used to draw samples from the predicted distribution.

Returns

predt_df : pd.DataFrame Predictions.

Source code in lightgbmlss/model.py
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
def predict(self,
            data: pd.DataFrame,
            pred_type: str = "parameters",
            n_samples: int = 1000,
            quantiles: list = [0.1, 0.5, 0.9],
            seed: str = 123):
    """
    Function that predicts from the trained model.

    Arguments
    ---------
    data : pd.DataFrame
        Data to predict from.
    pred_type : str
        Type of prediction:
        - "samples" draws n_samples from the predicted distribution.
        - "quantiles" calculates the quantiles from the predicted distribution.
        - "parameters" returns the predicted distributional parameters.
        - "expectiles" returns the predicted expectiles.
    n_samples : int
        Number of samples to draw from the predicted distribution.
    quantiles : List[float]
        List of quantiles to calculate from the predicted distribution.
    seed : int
        Seed for random number generator used to draw samples from the predicted distribution.

    Returns
    -------
    predt_df : pd.DataFrame
        Predictions.
    """

    # Predict
    predt_df = self.dist.predict_dist(booster=self.booster,
                                      data=data,
                                      start_values=self.start_values,
                                      pred_type=pred_type,
                                      n_samples=n_samples,
                                      quantiles=quantiles,
                                      seed=seed)

    return predt_df

save_model(model_path)

Save the model to a file.

Parameters

model_path : str The path to save the model.

Returns

None

Source code in lightgbmlss/model.py
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
def save_model(self,
               model_path: str
               ) -> None:
    """
    Save the model to a file.

    Parameters
    ----------
    model_path : str
        The path to save the model.

    Returns
    -------
    None
    """
    with open(model_path, "wb") as f:
        pickle.dump(self, f)

set_init_score(dmatrix)

Set init_score for distributions.

Arguments

dmatrix : Dataset Dataset to set base margin for.

Returns

None

Source code in lightgbmlss/model.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
def set_init_score(self, dmatrix: Dataset) -> None:
    """
    Set init_score for distributions.

    Arguments
    ---------
    dmatrix : Dataset
        Dataset to set base margin for.

    Returns
    -------
    None
    """
    if self.start_values is None:
        _, self.start_values = self.dist.calculate_start_values(dmatrix.get_label())
    init_score = (np.ones(shape=(dmatrix.get_label().shape[0], 1))) * self.start_values
    dmatrix.set_init_score(init_score.ravel(order="F"))

set_params(params)

Set parameters for distributional model.

Arguments

params : Dict[str, Any] Parameters for model.

Returns

params : Dict[str, Any] Updated Parameters for model.

Source code in lightgbmlss/model.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
def set_params(self, params: Dict[str, Any]) -> Dict[str, Any]:
    """
    Set parameters for distributional model.

    Arguments
    ---------
    params : Dict[str, Any]
        Parameters for model.

    Returns
    -------
    params : Dict[str, Any]
        Updated Parameters for model.
    """
    params_adj = {"num_class": self.dist.n_dist_param,
                  "metric": "None",
                  "objective": self.dist.objective_fn,
                  "random_seed": 123,
                  "verbose": -1
                  }
    params.update(params_adj)

    return params

set_valid_margin(valid_sets)

Function that sets the base margin for the validation set.

Arguments

valid_sets : list List of tuples containing the evaluation set(s).

Returns

valid_sets : list List of tuples containing the evaluation set(s).

Source code in lightgbmlss/model.py
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
def set_valid_margin(self,
                     valid_sets: list,
                     ) -> list:
    """
    Function that sets the base margin for the validation set.

    Arguments
    ---------
    valid_sets : list
        List of tuples containing the evaluation set(s).

    Returns
    -------
    valid_sets : list
        List of tuples containing the evaluation set(s).
    """
    for valid_set in valid_sets:
        self.set_init_score(valid_set)

    return valid_sets

train(params, train_set, num_boost_round=100, valid_sets=None, valid_names=None, init_model=None, feature_name='auto', categorical_feature='auto', keep_training_booster=False, callbacks=None)

Function to perform the training of a LightGBMLSS model with given parameters.

Parameters

params : dict Parameters for training. Values passed through params take precedence over those supplied via arguments. train_set : Dataset Data to be trained on. num_boost_round : int, optional (default=100) Number of boosting iterations. valid_sets : list of Dataset, or None, optional (default=None) List of data to be evaluated on during training. valid_names : list of str, or None, optional (default=None) Names of valid_sets. init_model : str, pathlib.Path, Booster or None, optional (default=None) Filename of LightGBM model or Booster instance used for continue training. feature_name : list of str, or 'auto', optional (default="auto") Feature names. If 'auto' and data is pandas DataFrame, data columns names are used. categorical_feature : list of str or int, or 'auto', optional (default="auto") Categorical features. If list of int, interpreted as indices. If list of str, interpreted as feature names (need to specify feature_name as well). If 'auto' and data is pandas DataFrame, pandas unordered categorical columns are used. All values in categorical features will be cast to int32 and thus should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. The output cannot be monotonically constrained with respect to a categorical feature. Floating point numbers in categorical features will be rounded towards 0. keep_training_booster : bool, optional (default=False) Whether the returned Booster will be used to keep training. If False, the returned value will be converted into _InnerPredictor before returning. This means you won't be able to use eval, eval_train or eval_valid methods of the returned Booster. When your model is very large and cause the memory error, you can try to set this param to True to avoid the model conversion performed during the internal call of model_to_string. You can still use _InnerPredictor as init_model for future continue training. callbacks : list of callable, or None, optional (default=None) List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.

Returns

booster : Booster The trained Booster model.

Source code in lightgbmlss/model.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def train(self,
          params: Dict[str, Any],
          train_set: Dataset,
          num_boost_round: int = 100,
          valid_sets: Optional[List[Dataset]] = None,
          valid_names: Optional[List[str]] = None,
          init_model: Optional[Union[str, Path, Booster]] = None,
          feature_name: _LGBM_FeatureNameConfiguration = 'auto',
          categorical_feature: _LGBM_CategoricalFeatureConfiguration = 'auto',
          keep_training_booster: bool = False,
          callbacks: Optional[List[Callable]] = None
          ) -> Booster:
    """Function to perform the training of a LightGBMLSS model with given parameters.

    Parameters
    ----------
    params : dict
        Parameters for training. Values passed through ``params`` take precedence over those
        supplied via arguments.
    train_set : Dataset
        Data to be trained on.
    num_boost_round : int, optional (default=100)
        Number of boosting iterations.
    valid_sets : list of Dataset, or None, optional (default=None)
        List of data to be evaluated on during training.
    valid_names : list of str, or None, optional (default=None)
        Names of ``valid_sets``.
    init_model : str, pathlib.Path, Booster or None, optional (default=None)
        Filename of LightGBM model or Booster instance used for continue training.
    feature_name : list of str, or 'auto', optional (default="auto")
        Feature names.
        If 'auto' and data is pandas DataFrame, data columns names are used.
    categorical_feature : list of str or int, or 'auto', optional (default="auto")
        Categorical features.
        If list of int, interpreted as indices.
        If list of str, interpreted as feature names (need to specify ``feature_name`` as well).
        If 'auto' and data is pandas DataFrame, pandas unordered categorical columns are used.
        All values in categorical features will be cast to int32 and thus should be less than int32 max value (2147483647).
        Large values could be memory consuming. Consider using consecutive integers starting from zero.
        All negative values in categorical features will be treated as missing values.
        The output cannot be monotonically constrained with respect to a categorical feature.
        Floating point numbers in categorical features will be rounded towards 0.
    keep_training_booster : bool, optional (default=False)
        Whether the returned Booster will be used to keep training.
        If False, the returned value will be converted into _InnerPredictor before returning.
        This means you won't be able to use ``eval``, ``eval_train`` or ``eval_valid`` methods of the returned Booster.
        When your model is very large and cause the memory error,
        you can try to set this param to ``True`` to avoid the model conversion performed during the internal call of ``model_to_string``.
        You can still use _InnerPredictor as ``init_model`` for future continue training.
    callbacks : list of callable, or None, optional (default=None)
        List of callback functions that are applied at each iteration.
        See Callbacks in Python API for more information.

    Returns
    -------
    booster : Booster
        The trained Booster model.
    """
    self.set_params(params)
    self.set_init_score(train_set)

    if valid_sets is not None:
        valid_sets = self.set_valid_margin(valid_sets)

    self.booster = lgb.train(params,
                             train_set,
                             num_boost_round=num_boost_round,
                             feval=self.dist.metric_fn,
                             valid_sets=valid_sets,
                             valid_names=valid_names,
                             init_model=init_model,
                             feature_name=feature_name,
                             categorical_feature=categorical_feature,
                             keep_training_booster=keep_training_booster,
                             callbacks=callbacks)

utils

exp_fn(predt)

Exponential function used to ensure predt is strictly positive.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def exp_fn(predt: torch.tensor) -> torch.tensor:
    """
    Exponential function used to ensure predt is strictly positive.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.exp(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt

exp_fn_df(predt)

Exponential function used for Student-T distribution.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def exp_fn_df(predt: torch.tensor) -> torch.tensor:
    """
    Exponential function used for Student-T distribution.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.exp(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt + torch.tensor(2.0, dtype=predt.dtype)

gumbel_softmax_fn(predt, tau=1.0)

Gumbel-softmax function used to ensure predt is adding to one.

The Gumbel-softmax distribution is a continuous distribution over the simplex, which can be thought of as a "soft" version of a categorical distribution. It’s a way to draw samples from a categorical distribution in a differentiable way. The motivation behind using the Gumbel-Softmax is to make the discrete sampling process of categorical variables differentiable, which is useful in gradient-based optimization problems. To sample from a Gumbel-Softmax distribution, one would use the Gumbel-max trick: add a Gumbel noise to logits and apply the softmax. Formally, given a vector z, the Gumbel-softmax function s(z,tau)_i for a component i at temperature tau is defined as:

s(z,tau)_i = frac{e^{(z_i + g_i) / tau}}{sum_{j=1}^M e^{(z_j + g_j) / tau}}

where g_i is a sample from the Gumbel(0, 1) distribution. The parameter tau (temperature) controls the sharpness of the output distribution. As tau approaches 0, the mixing probabilities become more discrete, and as tau approaches infty, the mixing probabilities become more uniform. For more information we refer to

Jang, E., Gu, Shixiang and Poole, B. "Categorical Reparameterization with Gumbel-Softmax", ICLR, 2017.
Arguments

predt: torch.tensor Predicted values. tau: float, non-negative scalar temperature. Temperature parameter for the Gumbel-softmax distribution. As tau -> 0, the output becomes more discrete, and as tau -> inf, the output becomes more uniform.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
def gumbel_softmax_fn(predt: torch.tensor,
                      tau: float = 1.0
                      ) -> torch.tensor:
    """
    Gumbel-softmax function used to ensure predt is adding to one.

    The Gumbel-softmax distribution is a continuous distribution over the simplex, which can be thought of as a "soft"
    version of a categorical distribution. It’s a way to draw samples from a categorical distribution in a
    differentiable way. The motivation behind using the Gumbel-Softmax is to make the discrete sampling process of
    categorical variables differentiable, which is useful in gradient-based optimization problems. To sample from a
    Gumbel-Softmax distribution, one would use the Gumbel-max trick: add a Gumbel noise to logits and apply the softmax.
    Formally, given a vector z, the Gumbel-softmax function s(z,tau)_i for a component i at temperature tau is
    defined as:

        s(z,tau)_i = frac{e^{(z_i + g_i) / tau}}{sum_{j=1}^M e^{(z_j + g_j) / tau}}

    where g_i is a sample from the Gumbel(0, 1) distribution. The parameter tau (temperature) controls the sharpness
    of the output distribution. As tau approaches 0, the mixing probabilities become more discrete, and as tau
    approaches infty, the mixing probabilities become more uniform. For more information we refer to

        Jang, E., Gu, Shixiang and Poole, B. "Categorical Reparameterization with Gumbel-Softmax", ICLR, 2017.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.
    tau: float, non-negative scalar temperature.
        Temperature parameter for the Gumbel-softmax distribution. As tau -> 0, the output becomes more discrete, and as
        tau -> inf, the output becomes more uniform.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    torch.manual_seed(123)
    predt = gumbel_softmax(nan_to_num(predt), tau=tau, dim=1) + torch.tensor(0, dtype=predt.dtype)


    return predt

identity_fn(predt)

Identity mapping of predt.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def identity_fn(predt: torch.tensor) -> torch.tensor:
    """
    Identity mapping of predt.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = nan_to_num(predt) + torch.tensor(0, dtype=predt.dtype)

    return predt

nan_to_num(predt)

Replace nan, inf and -inf with the mean of predt.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def nan_to_num(predt: torch.tensor) -> torch.tensor:
    """
    Replace nan, inf and -inf with the mean of predt.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.nan_to_num(predt,
                             nan=float(torch.nanmean(predt)),
                             posinf=float(torch.nanmean(predt)),
                             neginf=float(torch.nanmean(predt))
                             )

    return predt

relu_fn(predt)

Function used to ensure predt are scaled to max(0, predt).

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
def relu_fn(predt: torch.tensor) -> torch.tensor:
    """
    Function used to ensure predt are scaled to max(0, predt).

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.relu(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt

sigmoid_fn(predt)

Function used to ensure predt are scaled to (0,1).

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
def sigmoid_fn(predt: torch.tensor) -> torch.tensor:
    """
    Function used to ensure predt are scaled to (0,1).

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = torch.sigmoid(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)
    predt = torch.clamp(predt, 1e-03, 1-1e-03)

    return predt

softmax_fn(predt)

Softmax function used to ensure predt is adding to one.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
def softmax_fn(predt: torch.tensor) -> torch.tensor:
    """
    Softmax function used to ensure predt is adding to one.


    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = softmax(nan_to_num(predt), dim=1) + torch.tensor(0, dtype=predt.dtype)

    return predt

softplus_fn(predt)

Softplus function used to ensure predt is strictly positive.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def softplus_fn(predt: torch.tensor) -> torch.tensor:
    """
    Softplus function used to ensure predt is strictly positive.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = softplus(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt

softplus_fn_df(predt)

Softplus function used for Student-T distribution.

Arguments

predt: torch.tensor Predicted values.

Returns

predt: torch.tensor Predicted values.

Source code in lightgbmlss/utils.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def softplus_fn_df(predt: torch.tensor) -> torch.tensor:
    """
    Softplus function used for Student-T distribution.

    Arguments
    ---------
    predt: torch.tensor
        Predicted values.

    Returns
    -------
    predt: torch.tensor
        Predicted values.
    """
    predt = softplus(nan_to_num(predt)) + torch.tensor(1e-06, dtype=predt.dtype)

    return predt + torch.tensor(2.0, dtype=predt.dtype)