Event Times Sampler¤
PyDTS provides EventTimesSampler (ETS) class for sampling discrete-time survival data with competing risks and right censoring under the log-link model. In the following, we present an example of how to use the ETS to sample discrete-time with competing events and right censoring data.
Covariates¤
A user-supplied covariates should be passed to ETS. For example, consider a setting with \(n=10,000\) independent observations and the following covariates
and
Any sampling framework can be used for creating the covariates' dataframe. For example:
import numpy as np
import pandas as pd
n_observations = 10000
observations_df = pd.DataFrame(columns = ['Z1', 'Z2', 'Z3'])
observations_df['Z1'] = np.random.binomial(n = 1, p = 0.5,
size = n_observations)
Z1_zero_index = observations_df.loc[observations_df['Z1'] == 0].index
observations_df.loc[Z1_zero_index, 'Z2'] = np.random.normal(loc = 72, scale = 12,
size = n_observations - observations_df['Z1'].sum())
Z1_one_index = observations_df.loc[observations_df['Z1'] == 1].index
observations_df.loc[Z1_one_index, 'Z2'] = np.random.normal(loc = 82, scale = 12,
size = observations_df['Z1'].sum())
observations_df['Z3'] = 1 + np.random.poisson(lam = 4, size = n_observations)
Event Times¤
The ETS function assumes that the possible failure times are \(1, \ldots, d\), and the user should supply the value of \(d\). Clearly, the time intervals can be irregularly spaced and variable in size. For instance, discrete-time categories 1, 2, and 3 could correspond to specific days like Tuesday, Thursday, and Friday-Sunday, respectively. In the current example, we chose \(d=7\).
For the competing-events setting the user should decide on the number of competing events, and the values of model parameters, \(\alpha_{jt}\), \(\beta_j\), \(t=1,\ldots,d\), \(j=1,\ldots,M\). For example, consider \(M=2\) competing events and
All together, the ETS function is defined for sampling event-type and event-time, and adding it to the data frame, as follows:
from pydts.data_generation import EventTimesSampler
ets = EventTimesSampler(d_times=7, j_event_types=2)
coefficients_dict = {
"alpha": {
1: lambda t: -1 - 0.3 * np.log(t),
2: lambda t: -1.75 - 0.15 * np.log(t),
},
"beta": {
1: -1 * np.log([0.8, 1.4, 3]),
2: -1 * np.log([1, 0.95, 2]),
}}
observations_df = ets.sample_event_times(observations_df, coefficients_dict)
If the sampled covariates and parameters' values lead to impossible survival probabilities (i.e., negative or greater than one), the sampling process will be terminated with an error message. In such scenarios, it may be useful to adjust the coefficients or constrain extreme values of the covariates to ensure that the probabilities are appropriate and the sampling process is executed successfully.
Censoring Time¤
Two types of right censoring are implemented in PyDTS, administrative and random right censoring. For administrative censoring, \(J_i=0\) and \(T_i = d + 1\). These are the default values of observations for which the sampled event type was observed to be greater than \(d\). Random right censoring is optional and could be either dependent or independent of the covariates. For example, assume
The censoring times can be sampled by
To generate right-censoring times that depend on the covariates, the user should supply to censoring hazard function, \(\lambda_c(t|Z)\) in the form of the logit-link model. For example,
Updating the Observations¤
Finally, the observed data should be updated by \(X_i = min(T_i, C_i)\) and \(J_i\) as follows
The first observations of the sampled data are
References¤
[1] Meir, Tomer*, Gutman, Rom*, and Gorfine, Malka, "PyDTS: A Python Package for Discrete-Time Survival Analysis with Competing Risks" (2022)