Event Times Sampler
pydts.data_generation.EventTimesSampler(d_times, j_event_types)
¤
Bases: object
This class implements sampling procedure for discrete event times and censoring times for given observations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
d_times
|
int
|
number of possible event times |
required |
j_event_types
|
int
|
number of possible event types |
required |
Source code in src/pydts/data_generation.py
d_times = d_times
instance-attribute
¤
events = range(1, self.j_event_types + 1)
instance-attribute
¤
j_event_types = j_event_types
instance-attribute
¤
times = range(1, self.d_times + 2)
instance-attribute
¤
_validate_prob_dfs_list(dfs_list, numerical_error_tolerance=0.001)
¤
Source code in src/pydts/data_generation.py
calc_prob_t_given_j(prob_j_at_t, total_prob_j, numerical_error_tolerance=0.001)
¤
Calculates the conditional probability for event occurrance at time t given J_i=j.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prob_j_at_t
|
list
|
A list of dataframes, one for each event type, with the probability of event occurrance at time t to each of the observations. |
required |
total_prob_j
|
list
|
A list of dataframes, one for each event type, with the total probability of event occurrance to each of the observations. |
required |
numerical_error_tolerance
|
float
|
Tolerate numerical errors of probabilities up to this value. |
0.001
|
Returns:
Name | Type | Description |
---|---|---|
conditional_prob |
list
|
A list of dataframes, one for each event type, with the conditional probability of event occurrance at t given event type j to each of the observations. |
Source code in src/pydts/data_generation.py
calculate_hazards(observations_df, hazard_coefs, events=None)
¤
Calculates the hazard function for the observations given the hazard coefficients.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations_df
|
DataFrame
|
Dataframe with observations covariates. |
required |
coefficients
|
dict
|
time coefficients and covariates coefficients for each event type. |
required |
Returns:
Name | Type | Description |
---|---|---|
hazards_dfs |
list
|
A list of dataframes, one for each event type, with the hazard function at time t to each of the observations. |
Source code in src/pydts/data_generation.py
calculate_overall_survival(hazards, numerical_error_tolerance=0.001)
¤
Calculates the overall survival function given the hazards.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hazards
|
list
|
A list of hazards dataframes for each event type (as returned from EventTimesSampler.calculate_hazards function). |
required |
numerical_error_tolerance
|
float
|
Tolerate numerical errors of probabilities up to this value. |
0.001
|
Returns:
Name | Type | Description |
---|---|---|
overall_survival |
Dataframe
|
The overall survival functions. |
Source code in src/pydts/data_generation.py
calculate_prob_event_at_t(hazards, overall_survival, numerical_error_tolerance=0.001)
¤
Calculates the probability for event j at time t.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hazards
|
list
|
A list of hazards dataframes for each event type (as returned from EventTimesSampler.calculate_hazards function) |
required |
overall_survival
|
Dataframe
|
The overall survival functions |
required |
numerical_error_tolerance
|
float
|
Tolerate numerical errors of probabilities up to this value. |
0.001
|
Returns:
Name | Type | Description |
---|---|---|
prob_event_at_t |
list
|
A list of dataframes, one for each event type, with the probability of event occurrance at time t to each of the observations. |
Source code in src/pydts/data_generation.py
calculate_prob_event_j(prob_j_at_t, numerical_error_tolerance=0.001)
¤
Calculates the total probability for event j.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prob_j_at_t
|
list
|
A list of dataframes, one for each event type, with the probability of event occurrance at time t to each of the observations. |
required |
numerical_error_tolerance
|
float
|
Tolerate numerical errors of probabilities up to this value. |
0.001
|
Returns:
Name | Type | Description |
---|---|---|
total_prob_j |
list
|
A list of dataframes, one for each event type, with the total probability of event occurrance to each of the observations. |
Source code in src/pydts/data_generation.py
sample_event_times(observations_df, hazard_coefs, covariates=None, events=None, seed=None)
¤
Sample event type and event occurance times.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations_df
|
DataFrame
|
Dataframe with observations covariates. |
required |
covariates
|
list
|
list of covariates name, must be a subset of observations_df.columns |
None
|
coefficients
|
dict
|
time coefficients and covariates coefficients for each event type. |
required |
seed
|
(int, None)
|
numpy seed number for pseudo random sampling. |
None
|
Returns:
Name | Type | Description |
---|---|---|
observations_df |
DataFrame
|
Dataframe with additional columns for sampled event time (T) and event type (J). |
Source code in src/pydts/data_generation.py
sample_hazard_lof_censoring(observations_df, censoring_hazard_coefs, seed=None, covariates=None)
¤
Samples loss of follow-up censoring time from hazard coefficients.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations_df
|
DataFrame
|
Dataframe with observations covariates. |
required |
censoring_hazard_coefs
|
dict
|
time coefficients and covariates coefficients for the censoring hazard. |
required |
seed
|
int
|
pseudo random seed number for numpy.random.seed() |
None
|
covariates
|
list
|
list of covariates names, must be a subset of observations_df.columns |
None
|
Returns:
Name | Type | Description |
---|---|---|
observations_df |
DataFrame
|
Upadted dataframe including sampled censoring time. |
Source code in src/pydts/data_generation.py
sample_independent_lof_censoring(observations_df, prob_lof_at_t, seed=None)
¤
Samples loss of follow-up censoring time from probabilities independent of covariates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations_df
|
DataFrame
|
Dataframe with observations covariates. |
required |
prob_lof_at_t
|
array
|
Array of probabilities for sampling each of the possible times. |
required |
seed
|
int
|
pseudo random seed number for numpy.random.seed() |
None
|
Returns:
Name | Type | Description |
---|---|---|
observations_df |
DataFrame
|
Upadted dataframe including sampled censoring time. |
Source code in src/pydts/data_generation.py
sample_jt(total_prob_j, probs_t_given_j, numerical_error_tolerance=0.001)
¤
Sample event type and event time for each observation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
total_prob_j
|
list
|
A list of dataframes, one for each event type, with the total probability of event occurrance to each of the observations. |
required |
probs_t_given_j
|
list
|
A list of dataframes, one for each event type, with the conditional probability of event occurrance at t given event type j to each of the observations. |
required |
Returns:
Name | Type | Description |
---|---|---|
sampled_df |
DataFrame
|
A dataframe with sampled event time and event type for each observation. |
Source code in src/pydts/data_generation.py
update_event_or_lof(observations_df)
¤
Updates time column 'X' to be the minimum between event time column 'T' and censoring time column 'C'. Event type 'J' will be changed to 0 for observation with 'C' < 'T'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observations_df
|
DataFrame
|
Dataframe with observations after sampling event times 'T' and censoring time 'C'. |
required |
Returns:
Name | Type | Description |
---|---|---|
observations_df |
DataFrame
|
Dataframe with updated time column 'X' and event type column 'J' |