Package 'modelfitter' reference manual

Title:	Bootstrapping models and model fits
Description:	This is a very early work in progress.
Authors:	Robert Challen [aut, cre]
Maintainer:	Robert Challen <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.0.9031
Built:	2025-02-20 05:47:23 UTC
Source:	https://github.com/bristol-vaccine-centre/modelfitter

Reuse tidy-select syntax outside of a tidy-select function

Description

Reuse tidy-select syntax outside of a tidy-select function

Usage

as_vars(tidyselect, data = NULL)
as_vars(tidyselect, data = NULL)

Arguments

`tidyselect`	a tidyselect syntax which will be evaluated in context by looking for a call in the call stack that includes a dataframe as the first argument
`data`	(optional) a specific dataframe with which to evaluate the tidyselect

Value

a list of symbols resulting from the evaluation of the tidyselect in the context of the current call stack (or a provided data frame)

Provide a access to bootstrap resamples of a dataset

Description

Provide a access to bootstrap resamples of a dataset

Usage

bootstrap_provider(data, max_n, formulae = ~.)
bootstrap_provider(data, max_n, formulae = ~.)

Arguments

`data`	the data frame
`max_n`	the maximum number of different bootstraps
`formulae`	a list of formulae with all the columns used

Value

a function that returns a dataset for inputs between 1:max_n

Examples

bp = iris %>% bootstrap_provider(10)
bp(1) %>% head()
tmp = bp(1)
tmp2 = bp(1)
identical(tmp,tmp2)
bp = iris %>% bootstrap_provider(10)
bp(1) %>% head()
tmp = bp(1)
tmp2 = bp(1)
identical(tmp,tmp2)

Configure a single model fit using the `modelfitter` framework

Description

Configure a single model fit using the modelfitter framework

Usage

configure_model(model_name, model_formula, dataset, model_function)
configure_model(model_name, model_formula, dataset, model_function)

Arguments

`model_name`	a label for the model
`model_formula`	the model formula you want to fit
`dataset`	the dataset
`model_function`	the model function to fit, e.g. logistic_regression

Value

a configuration dataframe and data provider to execute the fit with

Examples

# Simple example
tmp = configure_model(
  "Iris model", 
  I(Species == "versicolor") ~ ., 
  iris, 
  modelfitter::logistic_regression
)
tmp2 = tmp %>% execute_configuration()
summfit = tmp2 %>% summarise_fits()
# Simple example
tmp = configure_model(
  "Iris model", 
  I(Species == "versicolor") ~ ., 
  iris, 
  modelfitter::logistic_regression
)
tmp2 = tmp %>% execute_configuration()
summfit = tmp2 %>% summarise_fits()

Configure a set of model fits from a specification of different combinations

Description

Configure a set of model fits from a specification of different combinations

Usage

configure_models(
  model_formula,
  dataset,
  model_function,
  data_subset = data_subset_provider(default = TRUE),
  formula_update = formula_provider(default = . ~ .)
)
configure_models(
  model_formula,
  dataset,
  model_function,
  data_subset = data_subset_provider(default = TRUE),
  formula_update = formula_provider(default = . ~ .)
)

Arguments

`model_formula`	the model formula or formulae you want to fit as a `formula_provider(...)`.
`dataset`	the dataset imputed or bootstrapped if need be as a `bootstrap_provider(...)` or `imputation_provider(...)`
`model_function`	the model functions to fit as a `model_function_provider(...)`
`data_subset`	the data subsets to fit as a `data_subset_provider(...)`
`formula_update`	and model formula updates to apply before fitting as a `formula_provider(...)`.

Value

a configuration dataframe and data provider to run

Examples


form = ~ bmi + age + chl

fp = formula_provider(
  `Univariate` = modelfitter::univariate_from_multivariate(form),
  `Adj Model 1` = form,
  `Adj Model 2` = update(form, . ~ . - chl),
)

mfp = model_function_provider(
   `Logistic` = modelfitter::logistic_regression,
   `Poisson` = modelfitter::quasi_poisson
)

ip = imputation_provider(
    mice::nhanes2 %>% dplyr::mutate(death=TRUE), 10)

fup = formula_provider(
  hypertension = hyp ~ .,
  death = death ~ .
)

dsp = data_subset_provider(
  all = TRUE,
  hypertensives = hyp == "yes",
  nonhypertensives = hyp == "no" 
)

tmp = configure_models(fp, ip, mfp, dsp, fup) %>% dplyr::glimpse()

# the number of models to fit will be this:
sum(tmp$n_boots)

form = ~ bmi + age + chl

fp = formula_provider(
  `Univariate` = modelfitter::univariate_from_multivariate(form),
  `Adj Model 1` = form,
  `Adj Model 2` = update(form, . ~ . - chl),
)

mfp = model_function_provider(
   `Logistic` = modelfitter::logistic_regression,
   `Poisson` = modelfitter::quasi_poisson
)

ip = imputation_provider(
    mice::nhanes2 %>% dplyr::mutate(death=TRUE), 10)

fup = formula_provider(
  hypertension = hyp ~ .,
  death = death ~ .
)

dsp = data_subset_provider(
  all = TRUE,
  hypertensives = hyp == "yes",
  nonhypertensives = hyp == "no" 
)

tmp = configure_models(fp, ip, mfp, dsp, fup) %>% dplyr::glimpse()

# the number of models to fit will be this:
sum(tmp$n_boots)

Provide a mechanism for subsetting data

Description

A data subset may be used for a sensitivity analysis. This provides a mechanism to filter the data within the pipeline.

Usage

data_subset_provider(...)
data_subset_provider(...)

Arguments

...

a set of named data filter criteria. In most cases you will want to include a default = TRUE option in this list. (this is the default value)

Value

a data subset provider which can fiter data based on a named set of subset criteria

Examples

dsp = data_subset_provider(one = TRUE, two = Species == "versicolor", three = Sepal.Width < 2.6)
dsp()
f = dsp("three")
f(iris) 

dsp2 = data_subset_provider()
dsp2("default")(mtcars) %>% head(10)
dsp = data_subset_provider(one = TRUE, two = Species == "versicolor", three = Sepal.Width < 2.6)
dsp()
f = dsp("three")
f(iris) 

dsp2 = data_subset_provider()
dsp2("default")(mtcars) %>% head(10)

Fit statistical models defined in a `modelfitter` configuration

Description

Fit statistical models defined in a modelfitter configuration

Usage

execute_configuration(
  cfg,
  retain_fit = sum(cfg$n_boots) < 50,
  performance = TRUE,
  ...,
  cache = FALSE
)
execute_configuration(
  cfg,
  retain_fit = sum(cfg$n_boots) < 50,
  performance = TRUE,
  ...,
  cache = FALSE
)

Arguments

`cfg`	a 4 column dataframe with model_name - minumum model_fn data_subset_fn model_formula
`retain_fit`	keep the model fit object? If the config results in less than 50 models then the default of this is TRUE. Otherwise we summarise the models as we go saving memory.
`performance`	calculate performance metrics for all the models. These will be summarised over bootstrap replicates. Setting this to false may be a good idea if you have lots of models.
`...`	Arguments passed on to `model_labels` `model` a model or list of models. `label_fn` a function that allows a predictor label to be renamed. This should expect a vector and return a vector of the same length. Levels will be terms in the model function and may be column names, or combinations thereof `subgroup_label_fn` a function that allows a subgroup label to be renamed. This should expect a vector and return a vector of the same length. The input to this function will be either a factor level name or a combination of them or whatever else the model decides to name it's coefficients.
`cache`	cache model output

Value

a 5 column dataframe with additional nested fitted results in model_fit. This nested dataframe will have the columns:

boot - an id for the data bootstrap or imputation;
fit (optional) - the model fit itself;
coef the model coefficients;
global_p the global p values for this model (see global_p_value());
performance (optional) the performance metrics for this model;

Examples


tmp = configure_model(
  "Iris", I(Species == "versicolor") ~ ., iris, logistic_regression)
tmp2 = tmp %>% execute_configuration()
 
tmp3  = configure_models(
   formula_provider(
      "<F" = I(color < "F") ~ cut + carat + clarity + price,
      "<H" = I(color < "H") ~ cut + carat + clarity + price
   ),
   bootstrap_provider(ggplot2::diamonds, max_n = 10),
   model_function_provider(
     "Log reg" = modelfitter::logistic_regression,
     "Poisson" = modelfitter::quasi_poisson
   )
)

tmp4 = tmp3 %>% execute_configuration(cache = TRUE)

tmp = configure_model(
  "Iris", I(Species == "versicolor") ~ ., iris, logistic_regression)
tmp2 = tmp %>% execute_configuration()
 
tmp3  = configure_models(
   formula_provider(
      "<F" = I(color < "F") ~ cut + carat + clarity + price,
      "<H" = I(color < "H") ~ cut + carat + clarity + price
   ),
   bootstrap_provider(ggplot2::diamonds, max_n = 10),
   model_function_provider(
     "Log reg" = modelfitter::logistic_regression,
     "Poisson" = modelfitter::quasi_poisson
   )
)

tmp4 = tmp3 %>% execute_configuration(cache = TRUE)

Get all used variables on LHS of a formula

Description

Get all used variables on LHS of a formula

Usage

f_lhs_vars(formula)
f_lhs_vars(formula)

Arguments

formula

a formula or list of formulae

Value

the variables on the RHS of the formulae as a character vector

Examples

f_lhs_vars(survival::Surv(time,event) ~ x + z + y)
f_lhs_vars(survival::Surv(time,event) ~ x + z + y)

Get all used variables on LHS of a formula

Description

Get all used variables on LHS of a formula

Usage

f_rhs_vars(formula)
f_rhs_vars(formula)

Arguments

formula

a formula or list of formulae

Value

the variables on the LHS of the formulae as a character vector

Examples

f_rhs_vars(survival::Surv(time,event) ~ x + z +y)
f_rhs_vars(survival::Surv(time,event) ~ pspline(x) + z + y)
f_rhs_vars(survival::Surv(time,event) ~ x + z +y)
f_rhs_vars(survival::Surv(time,event) ~ pspline(x) + z + y)

Format a confidence interval

Description

Format a confidence interval

Usage

format_ci(
  median,
  lower,
  upper,
  is_reference_value = FALSE,
  fmt.ci = "%s [%s – %s]",
  na.ci = "—",
  fn = format_ratio,
  ...
)
format_ci(
  median,
  lower,
  upper,
  is_reference_value = FALSE,
  fmt.ci = "%s [%s – %s]",
  na.ci = "—",
  fn = format_ratio,
  ...
)

Arguments

`median`	the median value
`lower`	the lower value
`upper`	the upper quantile
`is_reference_value`	is the value a reference value
`fmt.ci`	the layout of the 3 elements as a `sprintf` using `⁠%s⁠` for each element
`na.ci`	the value to show if the median ci is `NA`
`fn`	a function to format each number
`...`	Arguments passed on to `format_ratio` `x` a vector of numbers `fmt.ratio` a sprintf format string `max.ratio` a max ratio after which to display as e.g. `⁠>50⁠` or `⁠<0.02⁠` `na.ratio` a symbol in case the value is `NA`.

Value

a formatted CI string

Examples

format_ci(
  median = 2^(-5:5),
  lower = 2^(-5:5-1),
  upper = 2^(-5:5+1),
  fmt.ratio = "%1.3g"
)

format_ci(
  median = 2^(-5:5),
  lower = 2^(-5:5-1),
  upper = 2^(-5:5+1),
  fmt.ratio = "%1.3g"
)

Generate a table for a single model output

Description

The modelfitter configuration, execution, summary, format pipeline is for the complex use cases with multiple models. This is the simple version when we have a single fitted model and we want to print it in a table. This is more or less what gtsummary does.

Usage

format_model(model_name, fit, statistic, ...)
format_model(model_name, fit, statistic, ...)

Arguments

`model_name`	a title for the model.
`fit`	the model fit.
`statistic`	the type of stastic this model outputs (e.g. OR, HR, RR). This mostly defines the label in the table.
`...`	Arguments passed on to `format_summary` `summfit` A set of fitted, and summarised configured models `global.p` present the global p value (anova III) rather than the line by line values. `inv_link` the inverse of the link function employed in the models. This is almost always the inverse `exp(...)` unless we are dealing with a linear model. `col_header` a glue spec using columns from the summfit data table to label the model columns. `model_name` and `n_obs_summary` should be defined as a minimum. Other bits of metadata will be present if the table has been configured using `configure_models(...)` including `model_type_name`, `data_subset_name`, `model_base_name`, `model_update_name`, `n_boots`. `row_design` a glue spec for presenting the statistic. valid columns are `reference` - the referent status, `group.type`, `beta.lower`, `beta.median`, `beta.upper`, `value.lower`, `value.median`, `value.upper`, `p.value.mixture`, `global.p.mixture`, `global.p.method`. The helper functions `format_ratio(x,fmt.ratio = "%1.3g")` and `format_ci(med,low,hi,ref)` may be useful in this glue string `p_format` a function (or lambda) converting a number into a p-value string `font_size` (optional) the font size for the table in points `font` (optional) the font family for the table (which will be matched to closest on your system) `footer_text` any text that needs to be added at the end of the table, setting this to FALSE dsables the whole footer (as does `options("tableone.hide_footer"=TRUE)`). `summarise_fn` in the event that we want to present multiple models in the same column of a table it is possible that there are multiple entries for each variable. This function will combine them (at a text level) so they can be placed in a table. Examples could be `dplyr::first` or `~ paste0(.x,collapse="\n")`

Value

a huxtable formatted model summary table

Examples

coxfit = cox_model(survival::flchain, survival::Surv(futime, death) ~ sex * age)
format_model("FLChain", coxfit, "HR")
coxfit = cox_model(survival::flchain, survival::Surv(futime, death) ~ sex * age)
format_model("FLChain", coxfit, "HR")

Format a ratio, truncating at a set level above and below 1 on log scale.

Description

Format a ratio, truncating at a set level above and below 1 on log scale.

Usage

format_ratio(x, fmt.ratio = "%1.2f", max.ratio = 50, na.ratio = "Unk")
format_ratio(x, fmt.ratio = "%1.2f", max.ratio = 50, na.ratio = "Unk")

Arguments

`x`	a vector of numbers
`fmt.ratio`	a sprintf format string
`max.ratio`	a max ratio after which to display as e.g. `⁠>50⁠` or `⁠<0.02⁠`
`na.ratio`	a symbol in case the value is `NA`.

Value

a string of formatted ratios

Examples

format_ratio(2^(-6:6))
format_ratio(2^(-6:6))

Format a summary of multiple fits into a table.

Description

Format a summary of multiple fits into a table.

Usage

format_summary(
  summfit,
  ...,
  statistic = "OR",
  global.p = getOption("modelfitter.global_p_values", TRUE),
  inv_link = exp,
  col_header = "{model_name} (N={sprintf('%d',max(n_obs_summary))})",
  row_design =
    "{format_ci(value.median,value.lower,value.upper,reference,fmt.ratio = '%1.2g')}",
  p_format = NULL,
  font_size = getOption("modelfitter.font_size", 8),
  font = getOption("modelfitter.font", "Arial"),
  footer_text = NULL,
  summarise_fn = NULL
)
format_summary(
  summfit,
  ...,
  statistic = "OR",
  global.p = getOption("modelfitter.global_p_values", TRUE),
  inv_link = exp,
  col_header = "{model_name} (N={sprintf('%d',max(n_obs_summary))})",
  row_design =
    "{format_ci(value.median,value.lower,value.upper,reference,fmt.ratio = '%1.2g')}",
  p_format = NULL,
  font_size = getOption("modelfitter.font_size", 8),
  font = getOption("modelfitter.font", "Arial"),
  footer_text = NULL,
  summarise_fn = NULL
)

Arguments

`summfit`	A set of fitted, and summarised configured models
`...`	Arguments passed on to `model_labels` `model` a model or list of models. `label_fn` a function that allows a predictor label to be renamed. This should expect a vector and return a vector of the same length. Levels will be terms in the model function and may be column names, or combinations thereof `subgroup_label_fn` a function that allows a subgroup label to be renamed. This should expect a vector and return a vector of the same length. The input to this function will be either a factor level name or a combination of them or whatever else the model decides to name it's coefficients.
`statistic`	what model output is this table presenting? Is it an OR, a RR or a HR? or something else?
`global.p`	present the global p value (anova III) rather than the line by line values.
`inv_link`	the inverse of the link function employed in the models. This is almost always the inverse `exp(...)` unless we are dealing with a linear model.
`col_header`	a glue spec using columns from the summfit data table to label the model columns. `model_name` and `n_obs_summary` should be defined as a minimum. Other bits of metadata will be present if the table has been configured using `configure_models(...)` including `model_type_name`, `data_subset_name`, `model_base_name`, `model_update_name`, `n_boots`.
`row_design`	a glue spec for presenting the statistic. valid columns are `reference` - the referent status, `group.type`, `beta.lower`, `beta.median`, `beta.upper`, `value.lower`, `value.median`, `value.upper`, `p.value.mixture`, `global.p.mixture`, `global.p.method`. The helper functions `format_ratio(x,fmt.ratio = "%1.3g")` and `format_ci(med,low,hi,ref)` may be useful in this glue string
`p_format`	a function (or lambda) converting a number into a p-value string
`font_size`	(optional) the font size for the table in points
`font`	(optional) the font family for the table (which will be matched to closest on your system)
`footer_text`	any text that needs to be added at the end of the table, setting this to FALSE dsables the whole footer (as does `options("tableone.hide_footer"=TRUE)`).
`summarise_fn`	in the event that we want to present multiple models in the same column of a table it is possible that there are multiple entries for each variable. This function will combine them (at a text level) so they can be placed in a table. Examples could be `dplyr::first` or `~ paste0(.x,collapse="\n")`

Value

a huxtable tabular output of the model(s)

Examples

cfg  = configure_models(
   formula_provider(
      "<F" = I(color < "F") ~ cut + carat + clarity + price,
      "<H" = I(color < "H") ~ cut + carat + clarity + price
   ),
   bootstrap_provider(ggplot2::diamonds, max_n = 10),
   model_function_provider(
     "Log reg" = modelfitter::logistic_regression,
     "Poisson" = modelfitter::quasi_poisson
   )
)

exectn = cfg %>% execute_configuration(cache = TRUE)
summfit = exectn %>% summarise_fits()
hux = summfit %>% format_summary()
hux
cfg  = configure_models(
   formula_provider(
      "<F" = I(color < "F") ~ cut + carat + clarity + price,
      "<H" = I(color < "H") ~ cut + carat + clarity + price
   ),
   bootstrap_provider(ggplot2::diamonds, max_n = 10),
   model_function_provider(
     "Log reg" = modelfitter::logistic_regression,
     "Poisson" = modelfitter::quasi_poisson
   )
)

exectn = cfg %>% execute_configuration(cache = TRUE)
summfit = exectn %>% summarise_fits()
hux = summfit %>% format_summary()
hux

Construct a configuration for multiple formulae

Description

multiple formulae for a parameterised model fitting pipeline. Different models may be a key part of the analysis or a sensitivity. A single name maybe associated with one or many models - This is because it is frequent that we want to group together models such as univariate comparisons into the same format table as another single multivarible model. Another use case for multiple models is a single set of predictors

Usage

formula_provider(...)
formula_provider(...)

Arguments

...

a named list of formulae or formulae lists. In the simplest use case this is a single formula, see examples for more possibilities.

Examples

form = is_coloured ~ cut + carat + clarity * price
fp = formula_provider(
  `Univariate` = modelfitter::univariate_from_multivariate(form),
  `Adj Model 1` = form,
  `Adj Model 2` = update(form, . ~ . - clarity * price),
)


predictors = ~ x + y + z 
fp2 = formula_provider(
  `Death` = death ~ .,
  `ICU admission` = icu ~ .,
  `Delayed discharge` = delayed_discharge ~ .,
)
sapply(fp2(), fp2) %>% sapply(update, predictors)

# the simplest configuration
fp3 = formula_provider(outcome ~ x + y + z)
fp3()

fp4 = formula_provider(outcome1 ~ x + y + z, outcome2 ~ x + y + z)
fp4()
try(fp4("blah blah"))

# must define at least one forumula
try(formula_provider())

formula_provider(default = . ~ .)
form = is_coloured ~ cut + carat + clarity * price
fp = formula_provider(
  `Univariate` = modelfitter::univariate_from_multivariate(form),
  `Adj Model 1` = form,
  `Adj Model 2` = update(form, . ~ . - clarity * price),
)


predictors = ~ x + y + z 
fp2 = formula_provider(
  `Death` = death ~ .,
  `ICU admission` = icu ~ .,
  `Delayed discharge` = delayed_discharge ~ .,
)
sapply(fp2(), fp2) %>% sapply(update, predictors)

# the simplest configuration
fp3 = formula_provider(outcome ~ x + y + z)
fp3()

fp4 = formula_provider(outcome1 ~ x + y + z, outcome2 ~ x + y + z)
fp4()
try(fp4("blah blah"))

# must define at least one forumula
try(formula_provider())

formula_provider(default = . ~ .)

Calculate a global P-value for a model fit for each of the predictors.

Description

In the summary output of a model there is a p-value for each level of the factors whereas usually we want to know what the overall contribution of the individual predictor to the model is.

Usage

global_p_value(x, ...)
global_p_value(x, ...)

Arguments

`x`	a model
`...`	not used.

Details

This function usually calculates a Type 2 Anova for the predictors within the models but sometimes falls back to different methods if the model is not supported by the car::Anova or stats::anova packages. The method used is reported in the output. This is designed as an automated process and may not work in all situations, or produce appropriate output for all model types. Interaction terms may be particularly problematic.

Value

a dataframe of the predictors of the model (but not the levels) in one column and a global p value in another. The global.p will be given as zero or one despite this being not really possible. It si assumed that it will be formatted to truncate very low or very high values.

Examples

diamonds2 = ggplot2::diamonds %>% dplyr::mutate(
   is_coloured = color <= "F",
   dplyr::across(dplyr::where(is.ordered), ~ factor(.x,ordered=FALSE))
) %>% dplyr::select(-color)

lmodel = stats::lm(Petal.Width ~ ., iris)
glmodel = logistic_regression(diamonds2, is_coloured ~ cut + carat + clarity * price)
coxfit = cox_model(survival::flchain, survival::Surv(futime, death) ~ sex * age)

# Error conditions
try(anova(stats::lm( cut ~ carat + clarity * price, diamonds2)))
global_p_value(stats::lm( cut ~ carat + clarity * price, diamonds2))


# cox model
global_p_value(coxfit)

# linear model
global_p_value(lmodel)

# logistic regression (OR)
global_p_value(
  stats::glm(is_coloured ~ cut + carat + clarity + price, diamonds2,  family="binomial"))
global_p_value(
  logistic_regression(diamonds2, is_coloured ~ cut + carat + clarity * price))

# poisson regression (RR)
global_p_value(
  quasi_poisson(diamonds2, is_coloured ~ cut + carat + clarity * price))
global_p_value(
  robust_poisson(diamonds2, is_coloured ~ cut + carat + clarity * price))
global_p_value(
  robust_poisson_2(diamonds2, is_coloured ~ cut + carat + clarity * price))

# log binomial regression (RR)
# TODO: this does not work at the moment as an example as the log_binomial starting value is
# a problem
global_p_value(
  log_binomial(diamonds2, is_coloured ~ cut + carat + clarity + price))

# global_p_value(
#  log_binomial_2(diamonds2, is_coloured ~ cut + carat + clarity + price))


diamonds2 = ggplot2::diamonds %>% dplyr::mutate(
   is_coloured = color <= "F",
   dplyr::across(dplyr::where(is.ordered), ~ factor(.x,ordered=FALSE))
) %>% dplyr::select(-color)

lmodel = stats::lm(Petal.Width ~ ., iris)
glmodel = logistic_regression(diamonds2, is_coloured ~ cut + carat + clarity * price)
coxfit = cox_model(survival::flchain, survival::Surv(futime, death) ~ sex * age)

# Error conditions
try(anova(stats::lm( cut ~ carat + clarity * price, diamonds2)))
global_p_value(stats::lm( cut ~ carat + clarity * price, diamonds2))


# cox model
global_p_value(coxfit)

# linear model
global_p_value(lmodel)

# logistic regression (OR)
global_p_value(
  stats::glm(is_coloured ~ cut + carat + clarity + price, diamonds2,  family="binomial"))
global_p_value(
  logistic_regression(diamonds2, is_coloured ~ cut + carat + clarity * price))

# poisson regression (RR)
global_p_value(
  quasi_poisson(diamonds2, is_coloured ~ cut + carat + clarity * price))
global_p_value(
  robust_poisson(diamonds2, is_coloured ~ cut + carat + clarity * price))
global_p_value(
  robust_poisson_2(diamonds2, is_coloured ~ cut + carat + clarity * price))

# log binomial regression (RR)
# TODO: this does not work at the moment as an example as the log_binomial starting value is
# a problem
global_p_value(
  log_binomial(diamonds2, is_coloured ~ cut + carat + clarity + price))

# global_p_value(
#  log_binomial_2(diamonds2, is_coloured ~ cut + carat + clarity + price))

Provide access to missing data imputations

Description

Provide access to missing data imputations

Usage

imputation_provider(data, max_n, formulae = ~., ...)
imputation_provider(data, max_n, formulae = ~., ...)

Arguments

`data`	the data frame with missing rows
`max_n`	the maximum number of different imputations to
`formulae`	a lit of formulae with all the columns used (defaults to everything)
`...`	cache control

Value

a function that returns a dataset for inputs between 1:max_n

Examples

ip = imputation_provider(mice::nhanes2, 10, list(~ hyp + bmi, ~ age + chl))
ip(1) %>% head(10)

ip = imputation_provider(mice::nhanes2, 10, list(~ hyp + bmi, ~ age + chl))
ip(1) %>% head(10)

Fit standard log binomial regression using `stats::glm`

Description

The log binomial model using standard glm is less that satisfactory and hard to make converge. The algorithm needs starting values to have any hope and these do make a difference to the outcome. This needs more investigation before being used.

Usage

log_binomial(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)
log_binomial(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)

Arguments

`data`	a data frame
`formula`	a formula of the form `binary_outcome ~ obs1 + obs2 + ...`
`positive`	test strings to interpret as true if outcome is not a factor or logical.
`...`	not used

Value

a model object

Examples


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% log_binomial(is_coloured ~ cut + carat + clarity * price)
global_p_value(model5)
diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% log_binomial(is_coloured ~ cut + carat + clarity * price)
global_p_value(model5)

Fit standard log binomial regression using `logbin::logbin`

Description

This can be very slow for simple models, and will not handle interaction terms It does not seem to report std error. This is very much a work in progress,

Usage

log_binomial_2(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)
log_binomial_2(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)

Arguments

`data`	a data frame
`formula`	a formula of the form `binary_outcome ~ obs1 + obs2 + ...`
`positive`	test strings to interpret as true if outcome is not a factor or logical.
`...`	not used

Value

a model object

Examples


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% log_binomial_2(is_coloured ~ cut + carat + clarity + price)
global_p_value(model5)
diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% log_binomial_2(is_coloured ~ cut + carat + clarity + price)
global_p_value(model5)

Fit standard logistic regression with unordered factors.

Description

This makes an effort to cast the result column into a logical from a factor or other data type. It also will convert ordered predictors into unordered before running.

Usage

logistic_regression(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)
logistic_regression(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)

Arguments

`data`	a data frame
`formula`	a formula of the form `binary_outcome ~ obs1 + obs2 + ...`
`positive`	test strings to interpret as true if outcome is not a factor or logical.
`...`	not used

Value

a model object

Examples


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% 
  logistic_regression(is_coloured ~ cut + carat + clarity * price)
model6 = ggplot2::diamonds %>% 
  logistic_regression(I(color <= "F") ~ cut + carat + clarity * price)

# this wont work as the factors are converted to unordered
# model6 = ggplot2::diamonds %>% 
#   logistic_regression(I(color <= "F") ~ I(cut<"Very Good") + carat + clarity * price)


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% 
  logistic_regression(is_coloured ~ cut + carat + clarity * price)
model6 = ggplot2::diamonds %>% 
  logistic_regression(I(color <= "F") ~ cut + carat + clarity * price)

# this wont work as the factors are converted to unordered
# model6 = ggplot2::diamonds %>% 
#   logistic_regression(I(color <= "F") ~ I(cut<"Very Good") + carat + clarity * price)

Fit standard logistic regression.

Description

This makes an effort to cast the result column into a logical from a factor or other data type. It also will model ordered factors as contrasts rather than as a polynomial expansion, allowing an interpretable output from ordered factors.

Usage

logistic_regression_ordered(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)
logistic_regression_ordered(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)

Arguments

`data`	a data frame
`formula`	a formula of the form `binary_outcome ~ obs1 + obs2 + ...`
`positive`	test strings to interpret as true if outcome is not a factor or logical.
`...`	not used

Value

a model object

Examples


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% 
  logistic_regression_ordered(is_coloured ~ cut + carat + clarity * price)

diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% 
  logistic_regression_ordered(is_coloured ~ cut + carat + clarity * price)

Construct a configuration for multiple statistical models

Description

multiple statistical models for a parameterised model fitting pipeline.

Usage

model_function_provider(...)
model_function_provider(...)

Arguments

...

a named list of functions. In the simplest use case this is a single function, see examples for more possibilities. the functions must accept data as first parameter and formula as second.

Examples

# in the simplest version the name is pulled from the input
mfp = model_function_provider(logistic_regression)
mfp
mfp()

mfp = model_function_provider(logistic_regression, quasi_poisson)
mfp()

mfp2 = model_function_provider(
   Logistic = modelfitter::logistic_regression,
   Poisson = modelfitter::quasi_poisson
)
mfp2()
 
# the functions are named the other way round to normal model functions
# this was by design to fit into a tidy pipeline:
mfp = model_function_provider(linear = ~ stats::lm(.y,.x))
linear_model = mfp("linear")
iris %>% linear_model(Petal.Length ~ Petal.Width)
# in the simplest version the name is pulled from the input
mfp = model_function_provider(logistic_regression)
mfp
mfp()

mfp = model_function_provider(logistic_regression, quasi_poisson)
mfp()

mfp2 = model_function_provider(
   Logistic = modelfitter::logistic_regression,
   Poisson = modelfitter::quasi_poisson
)
mfp2()
 
# the functions are named the other way round to normal model functions
# this was by design to fit into a tidy pipeline:
mfp = model_function_provider(linear = ~ stats::lm(.y,.x))
linear_model = mfp("linear")
iris %>% linear_model(Petal.Length ~ Petal.Width)

Creates a table or plot row template for a set of models

Description

This produces a dataframe which can be used to arrange the coefficients and p-values from one statistical model, or set of models. The rows are a superset of all the coefficients of the models and is designed to be used to left_join the outputs of broom::tidy (by term) or global_p_value (by predictor) to construct a tabular output.

Usage

model_labels(model, label_fn, subgroup_label_fn, ...)
model_labels(model, label_fn, subgroup_label_fn, ...)

Arguments

`model`	a model or list of models.
`label_fn`	a function that allows a predictor label to be renamed. This should expect a vector and return a vector of the same length. Levels will be terms in the model function and may be column names, or combinations thereof
`subgroup_label_fn`	a function that allows a subgroup label to be renamed. This should expect a vector and return a vector of the same length. The input to this function will be either a factor level name or a combination of them or whatever else the model decides to name it's coefficients.
`...`	not used

Details

It is expect that use cases such as multiple univariate models + a few fully adjusted models are passed to this function and the result is to be displayed in a single plot or figure.

Value

a data frame with model.order, group.order, subgroup.order, characteristic, subgroup, dplaye columns and predictor and term key columns to link to the output of broom::tidy or global_p_value (i.e. Anova II/III outputs)

Examples

diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model = stats::glm(is_coloured ~ cut + carat + clarity * price, diamonds3, family="binomial")
model_labels(model, toupper, tolower)


model2 = logistic_regression(diamonds3, is_coloured ~ I(cut=="Good") + carat + clarity * price)
model_labels(model2, toupper, tolower)

model3 = stats::glm(is_coloured ~ carat + cut * clarity +  price, diamonds3, family="binomial")
model_labels(model3, toupper, tolower)


model4 = stats::glm(
  is_coloured ~ cut + carat + clarity + price, 
  diamonds3, 
  family="binomial", 
  contrasts=list(clarity=MASS::contr.sdif))
  
coef(model4)
model_labels(model4, toupper, tolower)

# tmp = .ordered_contrasts(diamonds3, )
# model5 = stats::glm(
#   is_coloured ~ cut + carat + clarity * price, 
#   diamonds3, 
#   family="binomial", contrasts=tmp)
#   
# model_labels(model5, toupper, tolower)

model6 = stats::glm(
  is_coloured ~ cut + carat + clarity + price_cat, 
  diamonds3, 
  family="binomial", 
  contrasts=list(clarity=MASS::contr.sdif, price_cat=MASS::contr.sdif)
)

model_labels(model6, toupper, tolower)

model7 = stats::glm(
  is_coloured ~ cut + carat + clarity + splines::ns(price,df=2), 
  diamonds3, 
  family="binomial", 
  contrasts=list(clarity=MASS::contr.sdif))
  
model_labels(model7, toupper)

diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model = stats::glm(is_coloured ~ cut + carat + clarity * price, diamonds3, family="binomial")
model_labels(model, toupper, tolower)


model2 = logistic_regression(diamonds3, is_coloured ~ I(cut=="Good") + carat + clarity * price)
model_labels(model2, toupper, tolower)

model3 = stats::glm(is_coloured ~ carat + cut * clarity +  price, diamonds3, family="binomial")
model_labels(model3, toupper, tolower)


model4 = stats::glm(
  is_coloured ~ cut + carat + clarity + price, 
  diamonds3, 
  family="binomial", 
  contrasts=list(clarity=MASS::contr.sdif))
  
coef(model4)
model_labels(model4, toupper, tolower)

# tmp = .ordered_contrasts(diamonds3, )
# model5 = stats::glm(
#   is_coloured ~ cut + carat + clarity * price, 
#   diamonds3, 
#   family="binomial", contrasts=tmp)
#   
# model_labels(model5, toupper, tolower)

model6 = stats::glm(
  is_coloured ~ cut + carat + clarity + price_cat, 
  diamonds3, 
  family="binomial", 
  contrasts=list(clarity=MASS::contr.sdif, price_cat=MASS::contr.sdif)
)

model_labels(model6, toupper, tolower)

model7 = stats::glm(
  is_coloured ~ cut + carat + clarity + splines::ns(price,df=2), 
  diamonds3, 
  family="binomial", 
  contrasts=list(clarity=MASS::contr.sdif))
  
model_labels(model7, toupper)

The cumulative density function of a mixture of normal distributions

Description

The cumulative density function of a mixture of normal distributions

Usage

pmixnorm(q, means, sds, weights = rep(1, length(means)), na.rm = FALSE)
pmixnorm(q, means, sds, weights = rep(1, length(means)), na.rm = FALSE)

Arguments

`q`	vector of quantiles.
`means`	a vector of normal distribution means
`sds`	a vector of normal distribution sds
`weights`	a vector of weights
`na.rm`	remove distributions which have NA for mean or sd

Value

the pdf of the mixture distribution.

Examples

pmixnorm(q=c(2,20), means=c(10,13,14), sds=c(1,1,2), weights=c(2,2,3))
pmixnorm(q=c(2,20), means=c(10,13,14), sds=c(1,1,2), weights=c(2,2,3))

A quantile function for a mixture of normal distributions

Description

A quantile function for a mixture of normal distributions

Usage

qmixnorm(p, means, sds, weights = rep(1, length(means)), na.rm = FALSE)
qmixnorm(p, means, sds, weights = rep(1, length(means)), na.rm = FALSE)

Arguments

`p`	vector of probabilities.
`means`	a vector of normal distribution means
`sds`	a vector of normal distribution sds
`weights`	a vector of weights
`na.rm`	remove distributions with NA values for mean or sd

Value

the value of the yth quantile

Examples

qmixnorm(p=c(0.025,0.5,0.975), means=c(10,13,14), sds=c(1,1,2))
qmixnorm(p=c(0.025,0.5,0.975), means=c(10,13,14), sds=c(1,1,2))

Fit standard poisson regression

Description

This is roughly equivalent to a log_binomial model but converges well. Its output can be interpreted at a RR. It is using a stats::glm with a log link function and a family of quasipoisson. The binary outcome is interpreted as a count where true is 1 and false is 0.

Usage

quasi_poisson(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)
quasi_poisson(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)

Arguments

`data`	a data frame
`formula`	a formula of the form `binary_outcome ~ obs1 + obs2 + ...`
`positive`	test strings to interpret as true if outcome is not a factor or logical.
`...`	not used

Value

a model object

Examples


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% quasi_poisson(is_coloured ~ cut + carat + clarity + price)
global_p_value(model5)
diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% quasi_poisson(is_coloured ~ cut + carat + clarity + price)
global_p_value(model5)

Fit standard poisson regression with ordered contrasts

Description

Usage

quasi_poisson_ordered(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)
quasi_poisson_ordered(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)

Arguments

`data`	a data frame
`formula`	a formula of the form `binary_outcome ~ obs1 + obs2 + ...`
`positive`	test strings to interpret as true if outcome is not a factor or logical.
`...`	not used

Value

a model object

Examples


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% quasi_poisson_ordered(is_coloured ~ cut + carat + clarity + price)
summary(model5)
diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% quasi_poisson_ordered(is_coloured ~ cut + carat + clarity + price)
summary(model5)

Provide a access to a dataset

Description

Provide a access to a dataset

Usage

raw_data_provider(data, formulae = ~.)
raw_data_provider(data, formulae = ~.)

Arguments

`data`	the data frame
`formulae`	a list of formulae with all the columns used

Value

a function that returns a dataset

Fit robust poisson regression using sandwich estimators

Description

This is roughly equivalent to a log_binomial model but converges well. Its output can be interpreted at a RR. It is using a glm with a log link function and a family of poisson with robust sandwich estimators. The binary outcome is interpreted as a count where true is 1 and false is 0.

Usage

robust_poisson(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)
robust_poisson(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)

Arguments

`data`	a data frame
`formula`	a formula of the form `binary_outcome ~ obs1 + obs2 + ...`
`positive`	test strings to interpret as true if outcome is not a factor or logical.
`...`	not used

Details

heteroskedasticity-consistent (HC) standard errors using sandwich: https://data.library.virginia.edu/understanding-robust-standard-errors/

Value

a model object

Examples


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% robust_poisson(is_coloured ~ cut + carat + clarity + price)
global_p_value(model5)
diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% robust_poisson(is_coloured ~ cut + carat + clarity + price)
global_p_value(model5)

Fit robust poisson regression using `geepack::glm`

Description

equivalent to a log_binomial model

Usage

robust_poisson_2(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)
robust_poisson_2(
  data,
  formula,
  positive = c("yes", "true", "present", "confirmed"),
  ...
)

Arguments

`data`	a data frame
`formula`	a formula of the form `binary_outcome ~ obs1 + obs2 + ...`
`positive`	test strings to interpret as true if outcome is not a factor or logical.
`...`	not used

Value

a model object

Examples


diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% robust_poisson_2(is_coloured ~ cut + carat + clarity + price)
global_p_value(model5)
diamonds3 = ggplot2::diamonds %>% dplyr::mutate(
  is_coloured = color <= "F",
  cut = factor(cut,ordered=FALSE),
  price_cat = tableone::cut_integer(price, c(500,1000,2000,4000))
) %>% dplyr::select(-color)

model5 = diamonds3 %>% robust_poisson_2(is_coloured ~ cut + carat + clarity + price)
global_p_value(model5)

Spline term marginal effects plot

Description

Spline term marginal effects plot

Usage

spline_term_plot(
  coxmodel,
  var_name,
  xlab = var_name,
  max_y = NULL,
  n_breaks = 7
)
spline_term_plot(
  coxmodel,
  var_name,
  xlab = var_name,
  max_y = NULL,
  n_breaks = 7
)

Arguments

`coxmodel`	an output of a coxph model
`var_name`	a variable that is involved in a spline term
`xlab`	x axis label
`max_y`	maximium hazard ratio to display on y axis. Inferred from the central estimates if missing, which will most likely cut off confidence intervals
`n_breaks`	The number of divisions on the y axis

Value

a ggplot

Sprintf with a list input

Description

A variant of sprintf that work well with inputs that are in the format of a list. Good examples of which are the quantile functions

Usage

sprintf_list(format, params, na.replace = "―")
sprintf_list(format, params, na.replace = "―")

Arguments

`format`	the format string
`params`	the inputs as a list (rather than as a set of individual numbers)
`na.replace`	a value to replace NA values with.

Value

the formatted string

Examples

# generate a mixture confidence interval from a set of distributions
sprintf_list("%1.2f [%1.2f\u2013%1.2f]",
 qmixnorm(p=c(0.5,0.025,0.975),
 means=c(10,13,14), sds=c(1,1,2)))
# generate a mixture confidence interval from a set of distributions
sprintf_list("%1.2f [%1.2f\u2013%1.2f]",
 qmixnorm(p=c(0.5,0.025,0.975),
 means=c(10,13,14), sds=c(1,1,2)))

Summarise a modelfitter execution

Description

A modelfitter execution may consist of multiple models, multiple data sets, and multiple bootstraps of data or imputations of data. This summarises the coefficients, global p values and performance metrics of each model fit over the multiple data bootstraps to get to a single set of coefficients, and metrics for each different model.

Usage

summarise_fits(exectn, ...)
summarise_fits(exectn, ...)

Arguments

`exectn`	a modelfitter execution of multiple model fits.
`...`	not used at present

Details

Coefficients are combined assuming normally distributed beta coefficients, prior to link function inversion, as mixtures of normal distributions and 95% confidence intervals calculated. performance metrics and global p values are given as means of the values of bootstrap.

Value

a nested dataframe with a single row per model and summary columns in coef_summary, global_p_summary and

Examples


# Complex example, multiple statsistical models, multiple formulae,
# bootstrapped data.
cfg  = configure_models(
   formula_provider(
      "<F" = I(color < "F") ~ cut + carat + clarity + price,
      "<H" = I(color < "H") ~ cut + carat + clarity + price
   ),
   bootstrap_provider(ggplot2::diamonds, max_n = 10),
   model_function_provider(
     "Log reg" = modelfitter::logistic_regression,
     "Poisson" = modelfitter::quasi_poisson
   )
)

exectn = cfg %>% execute_configuration(cache = TRUE)
summfit = exectn %>% summarise_fits()
 
# Simple example
tmp = configure_model(
  "Iris", I(Species == "versicolor") ~ ., iris, logistic_regression)
tmp2 = tmp %>% execute_configuration()
summfit = tmp2 %>% summarise_fits()

# Complex example, multiple statsistical models, multiple formulae,
# bootstrapped data.
cfg  = configure_models(
   formula_provider(
      "<F" = I(color < "F") ~ cut + carat + clarity + price,
      "<H" = I(color < "H") ~ cut + carat + clarity + price
   ),
   bootstrap_provider(ggplot2::diamonds, max_n = 10),
   model_function_provider(
     "Log reg" = modelfitter::logistic_regression,
     "Poisson" = modelfitter::quasi_poisson
   )
)

exectn = cfg %>% execute_configuration(cache = TRUE)
summfit = exectn %>% summarise_fits()
 
# Simple example
tmp = configure_model(
  "Iris", I(Species == "versicolor") ~ ., iris, logistic_regression)
tmp2 = tmp %>% execute_configuration()
summfit = tmp2 %>% summarise_fits()

Convert multivariate formula to list of univariate formulae

Description

Convert multivariate formula to list of univariate formulae

Usage

univariate_from_multivariate(formula)
univariate_from_multivariate(formula)

Arguments

formula

a formula of the type y ~ x1 + x2 + x3 + ....

Value

a list of formulae of the type y ~ x1, y ~ x2, y ~ x3, ....

Examples

univariate_from_multivariate(y ~ x1 + x2 + x3)
univariate_from_multivariate(~ x1 + x2 + x3)
univariate_from_multivariate(y ~ x1 + x2 + x3)
univariate_from_multivariate(~ x1 + x2 + x3)

Package 'modelfitter'

Help Index

Reuse tidy-select syntax outside of a tidy-select function

Description

Usage

Arguments

Value

Provide a access to bootstrap resamples of a dataset

Description

Usage

Arguments

Value

Examples

Configure a single model fit using the modelfitter framework

Description

Usage

Arguments

Value

Examples

Configure a set of model fits from a specification of different combinations

Description

Usage

Arguments

Value

Examples

Fit standard cox model.

Description

Usage

Arguments

Value

Provide a mechanism for subsetting data

Description

Usage

Arguments

Value

Examples

Fit exact logistic regression

Description

Usage

Arguments

Value

Fit statistical models defined in a modelfitter configuration

Description

Usage

Arguments

Value

Examples

Get all used variables on LHS of a formula

Description

Usage

Arguments

Value

Examples

Get all used variables on LHS of a formula

Description

Usage

Arguments

Value

Examples

Format a confidence interval

Description

Usage

Arguments

Value

Examples

Generate a table for a single model output

Description

Usage

Arguments

Value

Examples

Format a ratio, truncating at a set level above and below 1 on log scale.

Description

Usage

Arguments

Value

Examples

Format a summary of multiple fits into a table.

Description

Usage

Configure a single model fit using the `modelfitter` framework

Fit statistical models defined in a `modelfitter` configuration

Fit standard log binomial regression using `stats::glm`

Fit standard log binomial regression using `logbin::logbin`