Performance Measures¤
Model evaluation on test data or by CV, can be done using the evaluation functions available in PyDTS and the measures of performance presented in the Methods section.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pydts.examples_utils.generate_simulations_data import generate_quick_start_df
import warnings
pd.set_option("display.max_rows", 500)
warnings.filterwarnings('ignore')
%matplotlib inline
real_coef_dict = {
"alpha": {
1: lambda t: -1 - 0.3 * np.log(t),
2: lambda t: -1.75 - 0.15 * np.log(t)
},
"beta": {
1: -np.log([0.8, 3, 3, 2.5, 2]),
2: -np.log([1, 3, 4, 3, 2])
}
}
n_patients = 50000
n_cov = 5
patients_df = generate_quick_start_df(n_patients=n_patients, n_cov=n_cov, d_times=30, j_events=2,
pid_col='pid', seed=0, censoring_prob=0.8,
real_coef_dict=real_coef_dict)
train_df, test_df = train_test_split(patients_df, test_size=0.2)
patients_df.head()
For example, in the following code, the survival models are estimated based on the two-stage approach and the dataset train_df. Assume that the event of main interest is \(j=1\). Then, \(\pi_{i1}(t)\) are calculated and stored in pred_df, and finally \(\widehat{\mbox{AUC}}_1(t)\), \(t=1,\ldots,d\), are provided by
Other measures such as \(\widehat{\mbox{AUC}}_1\), \(\widehat{\mbox{BS}}_1\), \(\widehat{\mbox{AUC}}\), and \(\widehat{\mbox{BS}}\) can be calculated by
Model evaluation based on K-fold CV and TwoStagesFitter can be done by
Results of the AUC(t), BS(t) from the cross-validation procedure to each of the folds and each of the risks:
with the integrated AUC and BS to each of folds and each of the risks:
and lastly, the global AUC and global BS to each of the folds:
References¤
[1] Meir, Tomer*, Gutman, Rom*, and Gorfine, Malka, "PyDTS: A Python Package for Discrete-Time Survival Analysis with Competing Risks" (2022)