The Two Stages Procedure of Meir and Gorfine (2023) - Exact
pydts.fitters.TwoStagesFitterExact()
¤
Bases: TwoStagesFitter
Source code in src/pydts/fitters.py
alpha_df = pd.DataFrame()
instance-attribute
¤
beta_models = {}
instance-attribute
¤
beta_models_params_attr = 'params'
instance-attribute
¤
covariates = None
instance-attribute
¤
duration_col = None
instance-attribute
¤
event_models = {}
instance-attribute
¤
event_type_col = None
instance-attribute
¤
events = None
instance-attribute
¤
expanded_df = pd.DataFrame()
instance-attribute
¤
formula = None
instance-attribute
¤
pid_col = None
instance-attribute
¤
times = None
instance-attribute
¤
_alpha_jt(x, df, y_t, beta_j, n_jt, t, event)
¤
Source code in src/pydts/fitters.py
_expand_data(df, event_type_col, duration_col, pid_col)
¤
This method expands the raw data as explained in Lee et al. 2018
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Dataframe to expand. |
required |
event_type_col
|
str
|
The event type column name (must be a column in df), Right censored sample (i) is indicated by event value 0, df.loc[i, event_type_col] = 0. |
required |
duration_col
|
str
|
Last follow up time column name (must be a column in df). |
required |
pid_col
|
str
|
Sample ID column name (must be a column in df). |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Expanded df (pandas.DataFrame): the expanded dataframe. |
Source code in src/pydts/base_fitters.py
_fit_beta(expanded_df, events, model=CoxPHFitter, model_kwargs={}, model_fit_kwargs={})
¤
Source code in src/pydts/fitters.py
_fit_event_beta(expanded_df, event, model=ConditionalLogit, model_kwargs={}, model_fit_kwargs={})
¤
Source code in src/pydts/fitters.py
_hazard_inverse_transformation(a)
¤
This function defines the inverse transformation of the hazard function such that \(\lambda_j (t | Z) = h^{-1} ( lpha_{jt} + Z^{T} eta_{j} )\)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
a
|
Union[int, array, Series, DataFrame]
|
|
required |
Returns:
Name | Type | Description |
---|---|---|
i |
Union[int, array, Series, DataFrame]
|
the inverse function applied on a. $ h^{-1} (a) $ |
Source code in src/pydts/fitters.py
_hazard_transformation(a)
¤
This function defines the transformation of the hazard function such that $ h ( \lambda_j (t | Z) ) = lpha_{jt} + Z^{T} eta_{j} $
Parameters:
Name | Type | Description | Default |
---|---|---|---|
a
|
Union[int, array, Series, DataFrame]
|
|
required |
Returns:
Name | Type | Description |
---|---|---|
i |
Union[int, array, Series, DataFrame]
|
the inverse function applied on a. $ h^{-1} (a)$ |
Source code in src/pydts/fitters.py
_validate_cols(df, event_type_col, duration_col, pid_col)
¤
Source code in src/pydts/base_fitters.py
_validate_covariates_in_df(df)
¤
Source code in src/pydts/base_fitters.py
_validate_t(t, return_iter=True)
¤
Source code in src/pydts/base_fitters.py
evaluate(test_df, oracle_col='T', **kwargs)
¤
fit(df, covariates=None, event_type_col='J', duration_col='X', pid_col='pid', skip_expansion=False, x0=0, fit_beta_kwargs={}, verbose=2, nb_workers=WORKERS)
¤
This method fits a model to the discrete data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
training data for fitting the model |
required |
covariates
|
list
|
list of covariates to be used in estimating the regression coefficients |
None
|
event_type_col
|
str
|
The event type column name (must be a column in df), Right-censored sample (i) is indicated by event value 0, df.loc[i, event_type_col] = 0. |
'J'
|
duration_col
|
str
|
Last follow up time column name (must be a column in df). |
'X'
|
pid_col
|
str
|
Sample ID column name (must be a column in df). |
'pid'
|
skip_expansion
|
boolean
|
Skips the dataframe expansion step. Use this option only if the provided dataframe (df) is already correctly expanded. When set to True, the df is expected to be in the format produced by the pydts.utils.get_expanded_df() method, as if it were applied to the unexpanded data. |
False
|
x0
|
(Union[array, int], Optional)
|
initial guess to pass to scipy.optimize.minimize function |
0
|
fit_beta_kwargs
|
(dict, Optional)
|
Keyword arguments to pass on to the estimation procedure. |
{}
|
verbose
|
(int, Optional)
|
The verbosity level of pandaallel |
2
|
nb_workers
|
(int, Optional)
|
The number of workers to pandaallel. If not sepcified, defaults to the number of workers available. |
WORKERS
|
Returns:
Name | Type | Description |
---|---|---|
event_models |
dict
|
Fitted models dictionary. Keys - event names, Values - fitted models for the event. |
Source code in src/pydts/fitters.py
281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 |
|
get_alpha_df()
¤
This function returns the Alpha coefficients for all the events.
Returns:
Name | Type | Description |
---|---|---|
alpha_df |
DataFrame
|
Alpha coefficients Dataframe |
Source code in src/pydts/fitters.py
get_beta_SE()
¤
This function returns the Beta coefficients and their Standard Errors for all the events.
Returns:
Name | Type | Description |
---|---|---|
se_df |
DataFrame
|
Beta coefficients and Standard Errors Dataframe |
Source code in src/pydts/fitters.py
plot_all_events_alpha(ax=None, scatter_kwargs={}, colors=COLORS, show=True, title=None, xlabel='t', ylabel='$\\alpha_{jt}$', fontsize=18, ticklabelsize=15)
¤
This function plots a scatter plot of the $ alpha_{jt} $ coefficients of all the events.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ax
|
(Axes, Optional)
|
ax to use |
None
|
scatter_kwargs
|
(dict, Optional)
|
keywords to pass to the scatter function |
{}
|
colors
|
(list, Optional)
|
colors names |
COLORS
|
show
|
(bool, Optional)
|
if to use plt.show() |
True
|
title
|
(str, Optional)
|
axes title |
None
|
xlabel
|
(str, Optional)
|
axes xlabel |
't'
|
ylabel
|
(str, Optional)
|
axes ylabel |
'$\\alpha_{jt}$'
|
fontsize
|
(int, Optional)
|
axes title, xlabel, ylabel fontsize |
18
|
Returns:
Name | Type | Description |
---|---|---|
ax |
Axes
|
output figure |
Source code in src/pydts/fitters.py
plot_all_events_beta(ax=None, colors=COLORS, show=True, title=None, xlabel='Value', ylabel='$\\beta_{j}$', fontsize=18, ticklabelsize=15)
¤
This function plots the $ beta_{j} $ coefficients and standard errors of all the events.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ax
|
(Axes, Optional)
|
ax to use |
None
|
colors
|
(list, Optional)
|
colors names |
COLORS
|
show
|
(bool, Optional)
|
if to use plt.show() |
True
|
title
|
(str, Optional)
|
axes title |
None
|
xlabel
|
(str, Optional)
|
axes xlabel |
'Value'
|
ylabel
|
(str, Optional)
|
axes ylabel |
'$\\beta_{j}$'
|
fontsize
|
(int, Optional)
|
axes title, xlabel, ylabel fontsize |
18
|
ticklabelsize
|
(int, Optional)
|
axes xticklabels, yticklabels fontsize |
15
|
Returns:
Name | Type | Description |
---|---|---|
ax |
Axes
|
output figure |
Source code in src/pydts/fitters.py
plot_event_alpha(event, ax=None, scatter_kwargs={}, show=True, title=None, xlabel='t', ylabel='$\\alpha_{jt}$', fontsize=18, color=None, label=None, ticklabelsize=15)
¤
This function plots a scatter plot of the $ alpha_{jt} $ coefficients of a specific event.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
event
|
Union[str, int]
|
event name |
required |
ax
|
(Axes, Optional)
|
ax to use |
None
|
scatter_kwargs
|
(dict, Optional)
|
keywords to pass to the scatter function |
{}
|
show
|
(bool, Optional)
|
if to use plt.show() |
True
|
title
|
(str, Optional)
|
axes title |
None
|
xlabel
|
(str, Optional)
|
axes xlabel |
't'
|
ylabel
|
(str, Optional)
|
axes ylabel |
'$\\alpha_{jt}$'
|
fontsize
|
(int, Optional)
|
axes title, xlabel, ylabel fontsize |
18
|
color
|
(str, Optional)
|
color name to use |
None
|
label
|
(str, Optional)
|
label name |
None
|
Returns:
Name | Type | Description |
---|---|---|
ax |
Axes
|
output figure |
Source code in src/pydts/fitters.py
predict(df, **kwargs)
¤
predict_cumulative_incident_function(df)
¤
This function adds columns of the predicted hazard function, overall survival, probabilities of event occurance and cumulative incident function (CIF) to the given dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns included |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with additional prediction columns |
Source code in src/pydts/base_fitters.py
predict_event_cumulative_incident_function(df, event)
¤
This function adds a specific event columns of the predicted hazard function, overall survival, probabilities of event occurance and cumulative incident function (CIF) to the given dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns included |
required |
event
|
Union[str, int]
|
event name |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with additional prediction columns |
Source code in src/pydts/base_fitters.py
predict_hazard_all(df)
¤
This function calculates the hazard for all the events at all time values included in the training set for each event.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
samples to predict for |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
samples with the prediction columns |
Source code in src/pydts/base_fitters.py
predict_hazard_jt(df, event, t)
¤
This method calculates the hazard for the given event at the given time values if they were included in the training set of the event.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
samples to predict for |
required |
event
|
Union[str, int]
|
event name |
required |
t
|
Union[Iterable, int]
|
times to calculate the hazard for |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
samples with the prediction columns |
Source code in src/pydts/fitters.py
predict_hazard_t(df, t)
¤
This function calculates the hazard for all the events at the requested time values if they were included in the training set of each event.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
samples to predict for |
required |
t
|
(int, array)
|
times to calculate the hazard for |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
samples with the prediction columns |
Source code in src/pydts/base_fitters.py
predict_marginal_prob_all_events(df)
¤
This function calculates the marginal probability per event given the covariates for all the events.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns included |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with additional prediction columns |
Source code in src/pydts/base_fitters.py
predict_marginal_prob_event_j(df, event)
¤
This function calculates the marginal probability of an event given the covariates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns included |
required |
event
|
Union[str, int]
|
event name |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with additional prediction columns |
Source code in src/pydts/base_fitters.py
predict_overall_survival(df, t=None, return_hazards=False)
¤
This function adds columns of the overall survival until time t. Args: df (pandas.DataFrame): dataframe with covariates columns t (int): time return_hazards (bool): if to keep the hazard columns
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with the additional overall survival columns |
Source code in src/pydts/base_fitters.py
predict_prob_event_j_all(df, event)
¤
This function adds columns of a specific event occurrence probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns |
required |
event
|
Union[str, int]
|
event name |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with probabilities columns |
Source code in src/pydts/base_fitters.py
predict_prob_event_j_at_t(df, event, t)
¤
This function adds a column with probability of occurance of a specific event at a specific a time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
dataframe with covariates columns |
required |
event
|
Union[str, int]
|
event name |
required |
t
|
int
|
time |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe an additional probability column |
Source code in src/pydts/base_fitters.py
predict_prob_events(df)
¤
This function adds columns of all the events occurance probabilities. Args: df (pandas.DataFrame): dataframe with covariates columns
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
dataframe with probabilities columns |
Source code in src/pydts/base_fitters.py
print_summary(summary_func='print_summary', summary_kwargs={})
¤
This method prints the summary of the fitted models for all the events.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
summary_func
|
(str, Optional)
|
print summary method of the fitted model type ("summary", "print_summary"). |
'print_summary'
|
summary_kwargs
|
(dict, Optional)
|
Keyword arguments to pass to the model summary function. |
{}
|
Returns:
Type | Description |
---|---|
None
|
None |