Data Expansion Procedure of Lee et al. (2018)
pydts.fitters.DataExpansionFitter()
¤
Bases: ExpansionBasedFitter
This class implements the estimation procedure of Lee et al. (2018) [1]. See also the Example section.
Source code in src/pydts/fitters.py
covariates = None
instance-attribute
¤
duration_col = None
instance-attribute
¤
event_models = {}
instance-attribute
¤
event_type_col = None
instance-attribute
¤
events = None
instance-attribute
¤
expanded_df = pd.DataFrame()
instance-attribute
¤
formula = None
instance-attribute
¤
models_kwargs = dict(family=sm.families.Binomial())
instance-attribute
¤
pid_col = None
instance-attribute
¤
times = None
instance-attribute
¤
_expand_data(df, event_type_col, duration_col, pid_col)
¤
This method expands the raw data as explained in Lee et al. 2018
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Dataframe to expand. |
required |
event_type_col
|
str
|
The event type column name (must be a column in df), Right censored sample (i) is indicated by event value 0, df.loc[i, event_type_col] = 0. |
required |
duration_col
|
str
|
Last follow up time column name (must be a column in df). |
required |
pid_col
|
str
|
Sample ID column name (must be a column in df). |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Expanded df (pandas.DataFrame): the expanded dataframe. |
Source code in src/pydts/base_fitters.py
_fit_event(model_fit_kwargs={})
¤
This method fits a model for a GLM model for a specific event.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_fit_kwargs
|
(dict, Optional)
|
Keyword arguments to pass to model.fit() method. |
{}
|
Returns:
Type | Description |
---|---|
fitted GLM model |
Source code in src/pydts/fitters.py
_validate_cols(df, event_type_col, duration_col, pid_col)
¤
Source code in src/pydts/base_fitters.py
_validate_covariates_in_df(df)
¤
Source code in src/pydts/base_fitters.py
_validate_t(t, return_iter=True)
¤
Source code in src/pydts/base_fitters.py
evaluate(test_df, oracle_col='T', **kwargs)
¤
fit(df, event_type_col='J', duration_col='X', pid_col='pid', skip_expansion=False, covariates=None, formula=None, models_kwargs=None, model_fit_kwargs={})
¤
This method fits a model to the discrete data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
training data for fitting the model |
required |
event_type_col
|
str
|
The event type column name (must be a column in df), Right censored sample (i) is indicated by event value 0, df.loc[i, event_type_col] = 0. |
'J'
|
duration_col
|
str
|
Last follow up time column name (must be a column in df). |
'X'
|
pid_col
|
str
|
Sample ID column name (must be a column in df). |
'pid'
|
skip_expansion
|
boolean
|
Skips the dataframe expansion step. Use this option only if the provided dataframe (df) is already correctly expanded. When set to True, the df is expected to be in the format produced by the pydts.utils.get_expanded_df() method, as if it were applied to the unexpanded data. |
False
|
covariates
|
(list, Optional)
|
A list of covariates, all must be columns in df. Defaults to all the columns of df except event_type_col, duration_col, and pid_col. |
None
|
formula
|
(str, Optional)
|
Model formula to be fitted. Patsy format string. |
None
|
models_kwargs
|
(dict, Optional)
|
Keyword arguments to pass to model instance initiation. |
None
|
model_fit_kwargs
|
(dict, Optional)
|
Keyword arguments to pass to model.fit() method. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
event_models |
dict
|
Fitted models dictionary. Keys - event names, Values - fitted models for the event. |
Source code in src/pydts/fitters.py
get_alpha_df()
¤
This function returns the Alpha coefficients and their Standard Errors for all the events.
Returns:
Name | Type | Description |
---|---|---|
se_df |
DataFrame
|
Alpha coefficients and Standard Errors Dataframe |
Source code in src/pydts/fitters.py
get_beta_SE()
¤
This function returns the Beta coefficients and their Standard Errors for all the events.
Returns:
Name | Type | Description |
---|---|---|
se_df |
DataFrame
|
Beta coefficients and Standard Errors Dataframe |
Source code in src/pydts/fitters.py
predict(df, **kwargs)
¤
predict_cumulative_incident_function(df)
¤
This function adds columns of the predicted hazard function, overall survival, probabilities of event occurance and cumulative incident function (CIF) to the given dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns included |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with additional prediction columns |
Source code in src/pydts/base_fitters.py
predict_event_cumulative_incident_function(df, event)
¤
This function adds a specific event columns of the predicted hazard function, overall survival, probabilities of event occurance and cumulative incident function (CIF) to the given dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns included |
required |
event
|
Union[str, int]
|
event name |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with additional prediction columns |
Source code in src/pydts/base_fitters.py
predict_hazard_all(df)
¤
This function calculates the hazard for all the events at all time values included in the training set for each event.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
samples to predict for |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
samples with the prediction columns |
Source code in src/pydts/base_fitters.py
predict_hazard_jt(df, event, t, n_jobs=-1)
¤
This method calculates the hazard for the given event at the given time values if they were included in the training set of the event.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
samples to predict for |
required |
event
|
Union[str, int]
|
event name |
required |
t
|
array
|
times to calculate the hazard for |
required |
n_jobs
|
int
|
number of CPUs to use, defualt to every available CPU |
-1
|
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
samples with the prediction columns |
Source code in src/pydts/fitters.py
predict_hazard_t(df, t)
¤
This function calculates the hazard for all the events at the requested time values if they were included in the training set of each event.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
samples to predict for |
required |
t
|
(int, array)
|
times to calculate the hazard for |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
samples with the prediction columns |
Source code in src/pydts/base_fitters.py
predict_marginal_prob_all_events(df)
¤
This function calculates the marginal probability per event given the covariates for all the events.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns included |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with additional prediction columns |
Source code in src/pydts/base_fitters.py
predict_marginal_prob_event_j(df, event)
¤
This function calculates the marginal probability of an event given the covariates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns included |
required |
event
|
Union[str, int]
|
event name |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with additional prediction columns |
Source code in src/pydts/base_fitters.py
predict_overall_survival(df, t=None, return_hazards=False)
¤
This function adds columns of the overall survival until time t. Args: df (pandas.DataFrame): dataframe with covariates columns t (int): time return_hazards (bool): if to keep the hazard columns
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with the additional overall survival columns |
Source code in src/pydts/base_fitters.py
predict_prob_event_j_all(df, event)
¤
This function adds columns of a specific event occurrence probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns |
required |
event
|
Union[str, int]
|
event name |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with probabilities columns |
Source code in src/pydts/base_fitters.py
predict_prob_event_j_at_t(df, event, t)
¤
This function adds a column with probability of occurance of a specific event at a specific a time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns |
required |
event
|
Union[str, int]
|
event name |
required |
t
|
int
|
time |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe an additional probability column |
Source code in src/pydts/base_fitters.py
predict_prob_events(df)
¤
This function adds columns of all the events occurance probabilities. Args: df (pandas.DataFrame): dataframe with covariates columns
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with probabilities columns |
Source code in src/pydts/base_fitters.py
print_summary(summary_func='summary', summary_kwargs={})
¤
This method prints the summary of the fitted models for all the events.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
summary_func
|
(str, Optional)
|
print summary method of the fitted model type ("summary", "print_summary"). |
'summary'
|
summary_kwargs
|
(dict, Optional)
|
Keyword arguments to pass to the model summary function. |
{}
|
Returns:
Type | Description |
---|---|
None
|
None |