MIMIC-IV 2.0 analysis using PyDTS (Meir and Gorfine, 2025)¤
The utility of PyDTS is demonstrated through an analysis of patients' length of stay (LOS) in intensive care unit (ICU) [1]. This analysis uses the publicly accessible, large-scale Medical Information Mart for Intensive Care (MIMIC-IV, version 2.0) dataset.
Meir and Gorfine (2025) [1] developed a discrete-time survival model to predict ICU LOS based on patients’ clinical characteristics at admission. The dataset comprises 25,170 ICU patients. For each patient, only the last admission is considered, and features related to prior admission history are included. The LOS is recorded in discrete units from 1 to 28 days, resulting in many patients sharing the same event time on each day.
Three competing events are considered:
-
Discharge to home (69.0%),
-
Transfer to another medical facility (21.4%)
-
In-hospital death (6.1%).
Patients who left the ICU against medical advice (1.0%) are treated as right-censored, and administrative censoring is applied to those hospitalized for more than 28 days (2.5%).
The analysis includes 36 covariates per patient, comprising patient characteristics and laboratory test results at admission. Full description of the data is presented below.
The preprocessing procedure of [1,2] is implemented in pydts.example_utils.get_mimic_df()
.
Note that the MIMIC-IV dataset itself is not included in PyDTS; it is available at https://physionet.org/content/mimiciv/2.0/ and requires credentialed access.
Preprocessed MIMIC-IV v2.0 Dataset¤
Three estimation procedures were compared:
- The method of Lee et al. (2018) [3] without regularization
- The two-step approach of Meir and Gorfine (2025) [2] without regularization.
- The two-step approach of Meir and Gorfine (2025) [2] with LASSO regularization.
When applying the two-step procedure with LASSO regularization, we need to specify the hyperparameters that control the amount of regularization applied to each competing event. As demonstrated in the regularization section of this documentation, PyDTS provides functionality for tuning these hyperparameters via K-fold cross-validation. By default, the optimal values are those that maximize the out-of-sample global-AUC metric, as defined in Meir and Gorfine (2025), Appendix I. Additional tuning options are also available. Here, a grid search with 4-fold cross-validation was performed to select the optimal hyperparameters that maximize the global-AUC. The code below illustrates such tuning procedure.
import numpy as np
import warnings
warnings.filterwarnings('ignore')
from pydts.cross_validation import PenaltyGridSearchCV
import pandas as pd
step = 1
penalizers = np.arange(-12, -0.9, step=step)
n_splits = 4
seed = 1
penalty_cv_search = PenaltyGridSearchCV()
gauc_cv_results = penalty_cv_search.cross_validate(full_df=mimic_df, l1_ratio=1, penalizers=np.exp(penalizers),
n_splits=n_splits, seed=seed)
print(gauc_cv_results['Mean'].max())
print(gauc_cv_results['Mean'].idxmax())
chosen_eta = np.log(gauc_cv_results['Mean'].idxmax())
print(chosen_eta)
The procedure returns a pd.DataFrame
with the penalizers combination as the index and the mean and standard deviation of the global-AUC across folds as the values.
The chosen penalizaers \(\eta_j\), \(j=1,2,3\), are the ones that maximize the global-AUC, thus,
Additional metrics are also available - for example, the integrated AUC in each fold for each risk is included in penalty_cv_search.integrated_auc
Model Estimation¤
We now train and compare the three estimation methods, using the selected penalizers for the regularized two-step procedure:
Estimation using Lee et al. (2018)¤
Estimation using two-step without regularization¤
Estimation using two-step with regularization¤
reg_fitter = TwoStagesFitter()
fit_beta_kwargs = {
'model_kwargs': {
1: {'penalizer': np.exp(chosen_eta[0]), 'l1_ratio': 1},
2: {'penalizer': np.exp(chosen_eta[1]), 'l1_ratio': 1},
3: {'penalizer': np.exp(chosen_eta[2]), 'l1_ratio': 1},
}
}
reg_fitter.fit(df=mimic_df, fit_beta_kwargs=fit_beta_kwargs)
References¤
[1] Meir, Tomer and Gorfine, Malka, "Discrete-time Competing-Risks Regression with or without Penalization", Biometrics, Volume 81, Issue 2, 2025.
[2] Meir, Tomer and Gutman, Rom and Gorfine, Malka, "PyDTS: A Python Package for Discrete-Time Survival Analysis with Competing Risks", 2022.
[3] Lee, Minjung and Feuer, Eric J. and Fine, Jason P., "On the analysis of discrete time competing risks data", Biometrics, 2018.