XGBoostLSS
  • Home
  • Distributional Modelling
  • Available Distributions

Examples

  • Basic Walkthrough - Gaussian Regression
  • Dirichlet Regression
  • Expectile Regression
  • Gamma Regression (California Housing Data)
  • Gausssian-Mixture Regression (California Housing Data)
  • How to Select a Univariate Distribution
  • How to Select a Multivariate Distribution
  • Multivariate Gaussian Regression (Cholesky Decomposition)
  • Multivariate Gaussian Regression (Low-Rank Approximation)
  • Multivariate Student-T Regression (Cholesky Decomposition)
  • Spline Flow Regression
  • Zero-Adjusted Gamma Regression

API Docs

  • API references
XGBoostLSS
  • Examples
  • Basic Walkthrough - Gaussian Regression
  • Edit on GitHub

Basic Walkthrough - Gaussian Regression¶

Open in Colab

In this example, we model and predict all parameters of a univariate Normal distribution. Recall that distributional regression models and predicts all parameters $\theta_{ik}, k=1, \ldots, K$ parameters of a distribution $\mathcal{D}$ as a function of covariates:

\begin{equation} y_{i} \stackrel{ind}{\sim} \mathcal{D} \begin{pmatrix} h_{1}\bigl(\theta_{i1}(x_{i})\bigr) = \eta_{i1} \\ h_{2}\bigl(\theta_{i2}(x_{i})\bigr) = \eta_{i2} \\ \vdots \\ h_{K}\bigl(\theta_{iK}(x_{i})\bigr) = \eta_{iK} \end{pmatrix} \quad ,i=1, \ldots, N. \end{equation}

where $h_{k}(\cdot)$ transforms each distributional parameter to the corresponding parameter scale. For the univariate Normal case, we can specify the above as $y_{i} \stackrel{ind}{\sim} \mathcal{N}\bigl(\mu_{i}(x_{i}), \sigma_{i}(x_{i})\bigr)$. Since $\mu_{i}(\cdot) \in \mathbb{R}$ and since the standard-deviation cannot be negative, $h_{k}(\cdot)$ is applied to $\sigma_{i}(\cdot)$ only. Typical choices are the exponential or the softplus function.

Imports¶

First, we import the necessary functions.

In [2]:
Copied!
from xgboostlss.model import *
from xgboostlss.distributions.Gaussian import *
from xgboostlss.datasets.data_loader import load_simulated_gaussian_data
from scipy.stats import norm

import multiprocessing
import plotnine
from plotnine import *
plotnine.options.figure_size = (12, 8)
from xgboostlss.model import * from xgboostlss.distributions.Gaussian import * from xgboostlss.datasets.data_loader import load_simulated_gaussian_data from scipy.stats import norm import multiprocessing import plotnine from plotnine import * plotnine.options.figure_size = (12, 8)

Data¶

The data is simulated as a Gaussian, where $x_{true}$ is the only true feature and all others are noise variables:

  • $\mu(x_{true}) = 10$
  • $\sigma(x_{true}) = 1 + 4 * \bigr((0.3 < x_{true}) \& (x_{true} < 0.5)\bigl) + 2 * (x_{true} > 0.7)$

We first load the simulated dataset, filter for the target and covariates and then create the xgb.DMatrix. XGBoostLSS is designed to closely resemble the usage of XGBoost, ensuring ease of adoption and full compatibility.

In [3]:
Copied!
train, test = load_simulated_gaussian_data()
n_cpu = multiprocessing.cpu_count()

X_train, y_train = train.filter(regex="x"), train["y"].values
X_test, y_test = test.filter(regex="x"), test["y"].values

dtrain = xgb.DMatrix(X_train, label=y_train, nthread=n_cpu)
dtest = xgb.DMatrix(X_test, nthread=n_cpu)
train, test = load_simulated_gaussian_data() n_cpu = multiprocessing.cpu_count() X_train, y_train = train.filter(regex="x"), train["y"].values X_test, y_test = test.filter(regex="x"), test["y"].values dtrain = xgb.DMatrix(X_train, label=y_train, nthread=n_cpu) dtest = xgb.DMatrix(X_test, nthread=n_cpu)

Distribution Selection¶

Next, we specify a Gaussian distribution. By modifying the specification in the following, the user can specify alternative distributional assumptions. This includes the option to choose from a wide range of parametric univariate and multivariate distributions, as well as to model the data using Normalizing Flows. The user also has different function arguments for each distribution:

  • stabilization: specifies the stabilization method for the Gradient and Hessian. Options are None, MAD and L2.
  • response_fn: specifies $h_{k}(\cdot)$ and transforms the distributional parameter to the correct support. Here, we specify an exponential for $\sigma_{i}(\cdot)$ only.
  • loss_fn: specifies the loss function used for training. Options are nll (negative log-likelihood) or crps (continuous ranked probability score).

For additional details, see ?Gaussian.

In [4]:
Copied!
xgblss = XGBoostLSS(
    Gaussian(stabilization="None",  
             response_fn="exp",      
             loss_fn="nll"          
            )
)
xgblss = XGBoostLSS( Gaussian(stabilization="None", response_fn="exp", loss_fn="nll" ) )

Hyper-Parameter Optimization¶

Any XGBoost hyperparameter can be tuned, where the structure of the parameter dictionary needs to be as follows:

- Float/Int sample_type
    - {"param_name": ["sample_type", low, high, log]}
        - sample_type: str, Type of sampling, e.g., "float" or "int"
        - low: int, Lower endpoint of the range of suggested values
        - high: int, Upper endpoint of the range of suggested values
        - log: bool, Flag to sample the value from the log domain or not
    - Example: {"eta": "float", low=1e-5, high=1, log=True]}

- Categorical sample_type
    - {"param_name": ["sample_type", ["choice1", "choice2", "choice3", "..."]]}
        - sample_type: str, Type of sampling, either "categorical"
        - choice1, choice2, choice3, ...: str, Possible choices for the parameter
    - Example: {"booster": ["categorical", ["gbtree", "dart"]]}

- For parameters without tunable choice (this is needed if tree_method = "gpu_hist" and gpu_id needs to be specified)
    - {"param_name": ["none", [value]]},
        - param_name: str, Name of the parameter
        - value: int, Value of the parameter
    - Example: {"gpu_id": ["none", [0]]}

Depending on which parameters are optimized, it might happen that some of them are not used, e.g., when {"booster": ["categorical", ["gbtree", "gblinear"]]} and {"max_depth": ["int", 1, 10, False]} are specified, max_depth is not used when gblinear is sampled, since it has no such argument.

In [5]:
Copied!
param_dict = {
    "eta":              ["float", {"low": 1e-5,   "high": 1,     "log": True}],
    "max_depth":        ["int",   {"low": 1,      "high": 10,    "log": False}],
    "gamma":            ["float", {"low": 1e-8,   "high": 40,    "log": True}],
    "subsample":        ["float", {"low": 0.2,    "high": 1.0,   "log": False}],
    "colsample_bytree": ["float", {"low": 0.2,    "high": 1.0,   "log": False}],
    "min_child_weight": ["float", {"low": 1e-8,   "high": 500,   "log": True}],
    "booster":          ["categorical", ["gbtree"]]
}

np.random.seed(123)
opt_param = xgblss.hyper_opt(param_dict,
                             dtrain,
                             num_boost_round=100,        # Number of boosting iterations.
                             nfold=5,                    # Number of cv-folds.
                             early_stopping_rounds=20,   # Number of early-stopping rounds
                             max_minutes=10,             # Time budget in minutes, i.e., stop study after the given number of minutes.
                             n_trials=30 ,               # The number of trials. If this argument is set to None, there is no limitation on the number of trials.
                             silence=True,               # Controls the verbosity of the trail, i.e., user can silence the outputs of the trail.
                             seed=123,                   # Seed used to generate cv-folds.
                             hp_seed=123                 # Seed for random number generator used in the Bayesian hyperparameter search.
                            )
param_dict = { "eta": ["float", {"low": 1e-5, "high": 1, "log": True}], "max_depth": ["int", {"low": 1, "high": 10, "log": False}], "gamma": ["float", {"low": 1e-8, "high": 40, "log": True}], "subsample": ["float", {"low": 0.2, "high": 1.0, "log": False}], "colsample_bytree": ["float", {"low": 0.2, "high": 1.0, "log": False}], "min_child_weight": ["float", {"low": 1e-8, "high": 500, "log": True}], "booster": ["categorical", ["gbtree"]] } np.random.seed(123) opt_param = xgblss.hyper_opt(param_dict, dtrain, num_boost_round=100, # Number of boosting iterations. nfold=5, # Number of cv-folds. early_stopping_rounds=20, # Number of early-stopping rounds max_minutes=10, # Time budget in minutes, i.e., stop study after the given number of minutes. n_trials=30 , # The number of trials. If this argument is set to None, there is no limitation on the number of trials. silence=True, # Controls the verbosity of the trail, i.e., user can silence the outputs of the trail. seed=123, # Seed used to generate cv-folds. hp_seed=123 # Seed for random number generator used in the Bayesian hyperparameter search. )
  0%|          | 0/30 [00:00<?, ?it/s]
Hyper-Parameter Optimization successfully finished.
  Number of finished trials:  30
  Best trial:
    Value: 2924.2441896
    Params: 
    eta: 0.09351124167529237
    max_depth: 2
    gamma: 2.4167449579359e-07
    subsample: 0.994426126654586
    colsample_bytree: 0.7402148381306217
    min_child_weight: 0.00042927678410567983
    booster: gbtree
    opt_rounds: 52

Model Training¶

We use the optimized hyper-parameters and train the model.

In [6]:
Copied!
np.random.seed(123)

opt_params = opt_param.copy()
n_rounds = opt_params["opt_rounds"]
del opt_params["opt_rounds"]

# Train Model with optimized hyperparameters
xgblss.train(opt_params,
             dtrain,
             num_boost_round=n_rounds
             )
np.random.seed(123) opt_params = opt_param.copy() n_rounds = opt_params["opt_rounds"] del opt_params["opt_rounds"] # Train Model with optimized hyperparameters xgblss.train(opt_params, dtrain, num_boost_round=n_rounds )

Prediction¶

Similar to a XGBoost model, we now predict from the trained model. Different options are available:

  • samples: draws n_samples from the predicted distribution.
  • quantiles: calculates quantiles from the predicted distribution.
  • parameters: returns predicted distributional parameters.
In [7]:
Copied!
# Set seed for reproducibility
torch.manual_seed(123)

# Number of samples to draw from predicted distribution
n_samples = 1000
quant_sel = [0.05, 0.95] # Quantiles to calculate from predicted distribution

# Sample from predicted distribution
pred_samples = xgblss.predict(dtest,
                              pred_type="samples",
                              n_samples=n_samples,
                              seed=123)

# Calculate quantiles from predicted distribution
pred_quantiles = xgblss.predict(dtest,
                                pred_type="quantiles",
                                n_samples=n_samples,
                                quantiles=quant_sel)

# Return predicted distributional parameters
pred_params = xgblss.predict(dtest,
                             pred_type="parameters")
# Set seed for reproducibility torch.manual_seed(123) # Number of samples to draw from predicted distribution n_samples = 1000 quant_sel = [0.05, 0.95] # Quantiles to calculate from predicted distribution # Sample from predicted distribution pred_samples = xgblss.predict(dtest, pred_type="samples", n_samples=n_samples, seed=123) # Calculate quantiles from predicted distribution pred_quantiles = xgblss.predict(dtest, pred_type="quantiles", n_samples=n_samples, quantiles=quant_sel) # Return predicted distributional parameters pred_params = xgblss.predict(dtest, pred_type="parameters")
In [8]:
Copied!
pred_samples.head()
pred_samples.head()
Out[8]:
y_sample0 y_sample1 y_sample2 y_sample3 y_sample4 y_sample5 y_sample6 y_sample7 y_sample8 y_sample9 ... y_sample990 y_sample991 y_sample992 y_sample993 y_sample994 y_sample995 y_sample996 y_sample997 y_sample998 y_sample999
0 11.015219 8.734886 11.348413 11.232823 -0.274652 10.213624 14.052514 7.804887 12.677410 8.390224 ... 10.980301 11.188513 8.671056 9.589313 10.565292 9.265074 11.665381 7.814116 10.042959 6.515237
1 9.517735 9.632618 10.086379 12.088537 15.251966 10.720967 9.404674 9.895078 8.042754 16.188034 ... 10.386158 7.540595 3.837141 8.939890 15.170957 9.002590 8.670884 15.069681 9.314845 8.427787
2 9.685265 10.136473 10.548946 8.413225 10.939183 8.258053 10.346950 11.499171 10.899040 9.828666 ... 11.542138 10.680413 8.733493 9.789048 8.577210 9.261786 10.035842 10.193021 10.797279 9.483115
3 7.237225 15.162405 9.267536 -3.905732 12.267725 8.175550 6.925624 6.103096 3.742807 13.155276 ... 19.981014 15.370853 8.532224 12.171924 7.949986 13.375786 9.872285 8.022955 12.587466 7.898407
4 10.866986 10.086380 9.038006 8.726389 10.334625 8.419436 7.163653 7.222143 11.298795 5.337166 ... 9.535580 11.956666 9.180673 12.755524 11.633177 7.900021 10.396134 9.165933 7.960898 11.262684

5 rows × 1000 columns

In [9]:
Copied!
pred_quantiles.head()
pred_quantiles.head()
Out[9]:
quant_0.05 quant_0.95
0 5.664877 14.790590
1 5.993135 14.115732
2 8.293133 11.918892
3 2.811093 17.173132
4 5.231390 14.326776
In [10]:
Copied!
pred_params.head()
pred_params.head()
Out[10]:
loc scale
0 10.067525 2.809063
1 9.946658 2.412705
2 10.017097 1.093253
3 9.945558 4.605916
4 9.887371 2.810100

SHAP Interpretability¶

To get a deeper understanding of the data generating process, XGBoostLSS also provides attribute importance and partial dependence plots using the Shapley-Value approach.

In [11]:
Copied!
# Partial Dependence Plot of how x_true acts on variance
xgblss.plot(X_test,
            parameter="scale",
            feature="x_true",
            plot_type="Partial_Dependence")
# Partial Dependence Plot of how x_true acts on variance xgblss.plot(X_test, parameter="scale", feature="x_true", plot_type="Partial_Dependence")
No description has been provided for this image
No description has been provided for this image
In [12]:
Copied!
# Feature Importance of scale parameter
xgblss.plot(X_test,
            parameter="scale",
            plot_type="Feature_Importance")
# Feature Importance of scale parameter xgblss.plot(X_test, parameter="scale", plot_type="Feature_Importance")
No description has been provided for this image
No description has been provided for this image

Plot of Actual vs. Predicted Quantiles¶

In the following, we plot the predicted quantiles (blue) and compare them to the actual quantiles (dashed-black).

In [13]:
Copied!
np.random.seed(123)

###
# Actual Quantiles
###
q1 = norm.ppf(quant_sel[0], loc = 10, scale = 1 + 4*((0.3 < test["x_true"].values) & (test["x_true"].values < 0.5)) + 2*(test["x_true"].values > 0.7))
q2 = norm.ppf(quant_sel[1], loc = 10, scale = 1 + 4*((0.3 < test["x_true"].values) & (test["x_true"].values < 0.5)) + 2*(test["x_true"].values > 0.7))
test["quant"] = np.where(test["y"].values < q1, 0, np.where(test["y"].values < q2, 1, 2))
test["alpha"] = np.where(test["y"].values <= q1, 1, np.where(test["y"].values >= q2, 1, 0))
df_quantiles = test[test["alpha"] == 1]

# Lower Bound
yl = list(set(q1))
yl.sort()
yl = [yl[2],yl[0],yl[2],yl[1],yl[1]]
sfunl = pd.DataFrame({"x_true":[0, 0.3, 0.5, 0.7, 1], "y":yl})

# Upper Bound
yu = list(set(q2))
yu.sort()
yu = [yu[0],yu[2],yu[0],yu[1],yu[1]]
sfunu = pd.DataFrame({"x_true":[0, 0.3, 0.5, 0.7, 1], "y":yu})

###
# Predicted Quantiles
###
test["lb"] = pred_quantiles.iloc[:,0]
test["ub"] = pred_quantiles.iloc[:,1]

###
# Plot
###
(ggplot(test,
        aes("x_true",
            "y")) + 
 geom_point(alpha = 0.2, color = "black", size = 2) + 
 theme_bw(base_size=15) +
 theme(legend_position="none",
       plot_title = element_text(hjust = 0.5),
       plot_subtitle = element_text(hjust = 0.5)) +
 labs(title = "XGBoostLSS Regression - Simulated Data Example",
      subtitle = "Comparison of Actual (black) vs. Predicted Quantiles (blue)",
      x="x")  + 
 geom_line(aes("x_true",
               "ub"),
           size = 1,
           color = "blue", 
           alpha = 0.7) + 
 geom_line(aes("x_true",
               "lb"),
           size = 1,
           color = "blue", 
           alpha = 0.7) + 
 geom_point(df_quantiles,
            aes("x_true",
                "y"), 
            color = "red", 
            alpha = 0.7,
            size = 2) + 
 geom_step(sfunl,
           aes("x_true",
               "y"), 
           size = 1, 
           linetype = "dashed")  + 
 geom_step(sfunu,
           aes("x_true",
               "y"), 
           size = 1, 
           linetype = "dashed") 
)
np.random.seed(123) ### # Actual Quantiles ### q1 = norm.ppf(quant_sel[0], loc = 10, scale = 1 + 4*((0.3 < test["x_true"].values) & (test["x_true"].values < 0.5)) + 2*(test["x_true"].values > 0.7)) q2 = norm.ppf(quant_sel[1], loc = 10, scale = 1 + 4*((0.3 < test["x_true"].values) & (test["x_true"].values < 0.5)) + 2*(test["x_true"].values > 0.7)) test["quant"] = np.where(test["y"].values < q1, 0, np.where(test["y"].values < q2, 1, 2)) test["alpha"] = np.where(test["y"].values <= q1, 1, np.where(test["y"].values >= q2, 1, 0)) df_quantiles = test[test["alpha"] == 1] # Lower Bound yl = list(set(q1)) yl.sort() yl = [yl[2],yl[0],yl[2],yl[1],yl[1]] sfunl = pd.DataFrame({"x_true":[0, 0.3, 0.5, 0.7, 1], "y":yl}) # Upper Bound yu = list(set(q2)) yu.sort() yu = [yu[0],yu[2],yu[0],yu[1],yu[1]] sfunu = pd.DataFrame({"x_true":[0, 0.3, 0.5, 0.7, 1], "y":yu}) ### # Predicted Quantiles ### test["lb"] = pred_quantiles.iloc[:,0] test["ub"] = pred_quantiles.iloc[:,1] ### # Plot ### (ggplot(test, aes("x_true", "y")) + geom_point(alpha = 0.2, color = "black", size = 2) + theme_bw(base_size=15) + theme(legend_position="none", plot_title = element_text(hjust = 0.5), plot_subtitle = element_text(hjust = 0.5)) + labs(title = "XGBoostLSS Regression - Simulated Data Example", subtitle = "Comparison of Actual (black) vs. Predicted Quantiles (blue)", x="x") + geom_line(aes("x_true", "ub"), size = 1, color = "blue", alpha = 0.7) + geom_line(aes("x_true", "lb"), size = 1, color = "blue", alpha = 0.7) + geom_point(df_quantiles, aes("x_true", "y"), color = "red", alpha = 0.7, size = 2) + geom_step(sfunl, aes("x_true", "y"), size = 1, linetype = "dashed") + geom_step(sfunu, aes("x_true", "y"), size = 1, linetype = "dashed") )
No description has been provided for this image
Out[13]:
<Figure Size: (1200 x 800)>

True vs. Predicted Distributional Parameters¶

In the following figure, we compare the true parameters of the Gaussian with the ones predicted by XGBoostLSS. The below figure shows that the estimated parameters closely match the true ones (recall that the location parameter $\mu=10$ is simulated as being a constant).

In [14]:
Copied!
pred_params["x_true"] = X_test["x_true"].values
dist_params = list(xgblss.dist.param_dict.keys())

# Data with actual values
plot_df_actual = pd.melt(test[["x_true"] + dist_params],
                         id_vars="x_true",
                         value_vars=dist_params)
plot_df_actual["type"] = "TRUE"

# Data with predicted values
plot_df_predt = pd.melt(pred_params[["x_true"] + dist_params],
                        id_vars="x_true",
                        value_vars=dist_params)
plot_df_predt["type"] = "PREDICT"

plot_df = pd.concat([plot_df_predt, plot_df_actual])

plot_df["variable"] = plot_df.variable.str.upper()
plot_df["type"] = pd.Categorical(plot_df["type"], categories = ["PREDICT", "TRUE"])

(ggplot(plot_df,
        aes(x="x_true",
            y="value",
            color="type")) +
 geom_line(size=1.1) + 
 facet_wrap("variable",
            scales="free") + 
 labs(title="Parameters of univariate Gaussian predicted with XGBoostLSS",
      x="",
      y="") + 
 theme_bw(base_size=15) + 
 theme(legend_position="bottom",
       plot_title = element_text(hjust = 0.5),
       legend_title = element_blank())
)
pred_params["x_true"] = X_test["x_true"].values dist_params = list(xgblss.dist.param_dict.keys()) # Data with actual values plot_df_actual = pd.melt(test[["x_true"] + dist_params], id_vars="x_true", value_vars=dist_params) plot_df_actual["type"] = "TRUE" # Data with predicted values plot_df_predt = pd.melt(pred_params[["x_true"] + dist_params], id_vars="x_true", value_vars=dist_params) plot_df_predt["type"] = "PREDICT" plot_df = pd.concat([plot_df_predt, plot_df_actual]) plot_df["variable"] = plot_df.variable.str.upper() plot_df["type"] = pd.Categorical(plot_df["type"], categories = ["PREDICT", "TRUE"]) (ggplot(plot_df, aes(x="x_true", y="value", color="type")) + geom_line(size=1.1) + facet_wrap("variable", scales="free") + labs(title="Parameters of univariate Gaussian predicted with XGBoostLSS", x="", y="") + theme_bw(base_size=15) + theme(legend_position="bottom", plot_title = element_text(hjust = 0.5), legend_title = element_blank()) )
No description has been provided for this image
Out[14]:
<Figure Size: (1200 x 800)>

Actual vs. Predicted¶

Since we predict the entire conditional distribution, we can overlay the point predictions with predicted densities, from which we can also derive quantiles of interest.

In [15]:
Copied!
y_pred = []

n_examples = 9

for i in range(n_examples):    
    y_samples = pd.DataFrame(pred_samples.values[i,:].reshape(-1,1), columns=["PREDICT_DENSITY"])
    y_samples["PREDICT_POINT"] = y_samples["PREDICT_DENSITY"].mean()
    y_samples["PREDICT_Q05"] = y_samples["PREDICT_DENSITY"].quantile(q=quant_sel[0])
    y_samples["PREDICT_Q95"] = y_samples["PREDICT_DENSITY"].quantile(q=quant_sel[1])
    y_samples["ACTUAL"] = y_test[i]
    y_samples["obs"]= f"Obervation {i+1}"
    y_pred.append(y_samples)
    
pred_df = pd.melt(pd.concat(y_pred, axis=0), id_vars="obs")
pred_df["obs"] = pd.Categorical(pred_df["obs"], categories=[f"Obervation {i+1}" for i in range(n_examples)])
df_actual, df_pred_dens, df_pred_point, df_q05, df_q95 = [x for _, x in pred_df.groupby("variable")]

plot_pred = (
    ggplot(pred_df,
           aes(color="variable")) + 
    stat_density(df_pred_dens,
                 aes(x="value"),
                 size=1.1) + 
    geom_point(df_pred_point,
               aes(x="value",
                   y=0),
               size=1.4) + 
    geom_point(df_actual,
               aes(x="value",
                   y=0),
               size=1.4) + 
    geom_vline(df_q05, 
               aes(xintercept="value",
                   fill="variable",
                   color="variable"),
               linetype="dashed",
               size=1.1) + 
    geom_vline(df_q95, 
               aes(xintercept="value",
                   fill="variable",
                   color="variable"),
               linetype="dashed",
               size=1.1) + 
    facet_wrap("obs",
               scales="free",
               ncol=3) + 
    labs(title="Predicted vs. Actual \n",
         x = "") + 
    theme_bw(base_size=15) +
    scale_fill_brewer(type="qual", palette="Dark2") + 
    theme(legend_position="bottom",
          plot_title = element_text(hjust = 0.5),
          legend_title = element_blank()
         )
)

print(plot_pred)
y_pred = [] n_examples = 9 for i in range(n_examples): y_samples = pd.DataFrame(pred_samples.values[i,:].reshape(-1,1), columns=["PREDICT_DENSITY"]) y_samples["PREDICT_POINT"] = y_samples["PREDICT_DENSITY"].mean() y_samples["PREDICT_Q05"] = y_samples["PREDICT_DENSITY"].quantile(q=quant_sel[0]) y_samples["PREDICT_Q95"] = y_samples["PREDICT_DENSITY"].quantile(q=quant_sel[1]) y_samples["ACTUAL"] = y_test[i] y_samples["obs"]= f"Obervation {i+1}" y_pred.append(y_samples) pred_df = pd.melt(pd.concat(y_pred, axis=0), id_vars="obs") pred_df["obs"] = pd.Categorical(pred_df["obs"], categories=[f"Obervation {i+1}" for i in range(n_examples)]) df_actual, df_pred_dens, df_pred_point, df_q05, df_q95 = [x for _, x in pred_df.groupby("variable")] plot_pred = ( ggplot(pred_df, aes(color="variable")) + stat_density(df_pred_dens, aes(x="value"), size=1.1) + geom_point(df_pred_point, aes(x="value", y=0), size=1.4) + geom_point(df_actual, aes(x="value", y=0), size=1.4) + geom_vline(df_q05, aes(xintercept="value", fill="variable", color="variable"), linetype="dashed", size=1.1) + geom_vline(df_q95, aes(xintercept="value", fill="variable", color="variable"), linetype="dashed", size=1.1) + facet_wrap("obs", scales="free", ncol=3) + labs(title="Predicted vs. Actual \n", x = "") + theme_bw(base_size=15) + scale_fill_brewer(type="qual", palette="Dark2") + theme(legend_position="bottom", plot_title = element_text(hjust = 0.5), legend_title = element_blank() ) ) print(plot_pred)
No description has been provided for this image

Previous Next

Built with MkDocs using a theme provided by Read the Docs.
GitHub « Previous Next »