Data Preparation
Data Generation¤
For simplicity of presentation, we considered \(M=2\) competing events, though PyDTS can handle any number of competing events as long as there are enough observed failures of each failure type, at each discrete time point.
Here, \(d=30\) discrete time points, \(n=50,000\) observations, and \(Z\) with 5 covariates. Failure times of observations, denoted as \(T\), were generated based on the model:
with
\(\alpha_{1t} = -1 -0.3 \log(t)\),
\(\alpha_{2t} = -1.75 -0.15\log(t)\), \(t=1,\ldots,d\),
\(\beta_1 = (-\log 0.8, \log 3, \log 3, \log 2.5, \log 2)\),
\(\beta_{2} = (-\log 1, \log 3, \log 4, \log 3, \log 2)\).
Censoring time for each observation was sampled from a discrete uniform distribution, i.e. \(C_i \sim \mbox{Uniform}\{1,...,d+1\}\). The last observed time \(X\) is calculated as \(X_i = \min(T_i, C_i)\), and \(J\) is the event-type with \(J_i=0\) if and only if \(C_i < T_i\).
Our goal is estimating \(\{\alpha_{11},\ldots,\alpha_{1d},\beta_1^T,\alpha_{21},\ldots,\alpha_{2d},\beta_2^T\}\) (70 parameters in total) along with the standard error of the estimators.
Checking the Data¤
Both estimation methods require enough observed failures of each failure type, at each discrete time point. Therefore, the first step is to make sure this is in fact the case with the data at hand.
As shown below, in our example, the data comply with this requirement.
Preprocessing suggestions for cases when the data do not comply with this requirement are shown in Data Regrouping Example.