--- title: "tableone: Configuration" output: html_document vignette: > %\VignetteIndexEntry{tableone: Configuration} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) here::i_am("vignettes/configuration.Rmd") source(here::here("vignettes/vignette-utils.R")) original_opts = options() .Last = function() { options(original_opts) } options(tableone.quiet = TRUE) library(tidyverse) library(tableone) library(survival) ``` # Configuration and formatting options This vignette provides examples of some of the formatting options. To demonstrate them we will use the `survival::cgd` dataset: ```{r} # set up the data gcd = survival::cgd %>% # filter to include only the first visit dplyr::filter(enum==1) %>% # make the steroids and propylac columns into a logical value # see later for a better way of doing this. dplyr::mutate( steroids = as.logical(steroids), propylac = as.logical(propylac) ) # A basic unstratified population description table is as follows: formula = Surv(tstart, tstop, status) ~ treat + sex + age + height + weight + inherit + steroids + hos.cat gcd %>% compare_population(formula) ``` ## Column labelling * A custom labeller function can be defined for the table. ```{r} # set a table relabelling function rename_cols = function(col) { dplyr::case_when( col == "hos.cat" ~ "Location", col == "steroids" ~ "Steroid treatment", TRUE ~ stringr::str_to_sentence(col) ) } # set it using an option # we are not going to reset this as we will use in all the subsequent examples: options("tableone.labeller"=rename_cols) gcd %>% compare_population(formula) ``` ## Content format * Change the decimal point * Change the font and font size * Change the labelling of the p-value column * Change the format of the p-value * Hide the daggers for the method for the p-value ```{r} old = options( # set a mid point as decimal point "tableone.dp"="\u00B7", "tableone.font"="Arial Narrow", "tableone.font_size"=12, "tableone.pvalue_column_name"="p-value", # the p-value formatter must be a function that takes a vector of numbers and returns # a vector of characters. The example here is a function that returns a function. "tableone.pvalue_formatter" = scales::label_pvalue(accuracy = 0.01,decimal.mark = "\u00B7"), "tableone.show_pvalue_method"=FALSE ) gcd %>% compare_population(formula) # reset options(old) ``` ## Summary types The default statistics may seem wrong for the data, particularly the decision around whether to present mean or median, which depends on the detection of normality in the data. The presentation can be overridden by supplying a named list to `override_type`, the names here being the original column names to override. This won't change the method of detection of significance which depends on the detection of normality. The test type and significance levels are also configurable. ```{r} # override_type - names list of column names and summary type out of options # with this looser definition of normality (i.e. less likely to reject the null # that the data is normally distributed), height and weight are found to be # and hence the t-test is used. old = options( "tableone.normality_test"="lillie", "tableone.normality_significance"=0.00001 ) gcd %>% compare_population( formula, # age is still not normally distributed but we can override it to be # presented as a mean and SD. override_type = list(age="mean_sd") ) options(old) ``` ```R # the following option also controls which parametric test is chosen (between) # wilcoxon and ks tests: # options("tableone.tolerance_to_ties"=0.25) ``` ## Customising the number of decimal places Need to change this on a column by column basis (eg. here reals using a named list) or on a systematic bases (e.g. percent). Specification can either be as fixed (e.g. "2f") or significant figures (e.g. "3g"). N.b. This setting is independent of the p-value formatter. ```{r} gcd %>% compare_population( formula, # can supply either the "5f" (for 5 digits floating point) or "6g" # for 6 significant figures syntax: override_real_dp = list(age="0f",height="0f",weight="2f"), # or a plain set of numbers. If the option is unnamed it is applied to # all the variables: override_percent_dp = 0 ) ``` ## Summary format customisation Standard layouts are defined, `r paste0("\"", names(default.format),"\"",collapse=", ")`, and these can be used in the layout parameter to give a particular format to the columns and content of the table. ```{r} gcd %>% compare_population( formula, layout = "relaxed" ) ``` ## Custom layouts The "relaxed" standard layout is defined using a list. This is shown below: ```{r echo=FALSE} paste0("```R\n",paste0(.f(tableone::default.format$relaxed),collapse="\n"),"\n```") %>% knitr::asis_output() ``` We can produce a customised list based on this and supply it to a formatting function as the `layout` parameter. The named list defines the column name and the column contents, at the moment one item in this list must be named `characteristic`. The column contents can refer to the following variables: * `subtype_count` can use `{level}`, `{prob.0.5}`, `{prob.0.025}`, `{prob.0.975}`, `{x}`, `{n}`, `{N}` - `x` is subgroup count, `n` is data count excluding missing, `N` includes missing. * `median_iqr` can use `{q.0.5}`, `{q.0.25}`, ..., `{unit}`, `{n}`, `{N}` - `n` excludes missing, `N` does not. * `mean_sd` can use `{mean}`, `{sd}`, `{unit}`, `{n}`, `{N}` - `n` excludes missing, `N` does not. * `skipped` can use `{unit}`, `{n}`, `{N}` - `n` excludes missing, `N` does not. Other than the characteristic column, the column names are derived from the names of the custom configuration list. The names can also be configured using `glue` and this can use intervention level data like `{N}` for the subgroup counts or data level variables such as `{N_total}` which is the number of items across all groups or `{N_missing}` for example. There are a few useful formatting functions that the spec can also use beyond the usual text processing functions: * `.sprintf_na` - `sprintf`s a set of numbers replacing the output with `getOption("tableone.na","\u2014")` if all values are missing, and if some values are missing replacing each individual missing value with `getOption("tableone.missing","")` * `.sprintf_no_na` - `sprintf`s a set of numbers replacing the output with `getOption("tableone.na","\u2014")` if any values are missing * `.maybe` - returns a string if it is present or "" if NA ```{r} custom = list( subtype_count = list( characteristic = "{level}", "Value (N={N}/{N_total})" = "{.sprintf_na('%1.1f%% (%d/%d)',prob.0.5*100,x,n)}" ), median_iqr = list( characteristic = "Median (N)", "Value (N={N}/{N_total})" = "{.sprintf_na('%1.3g (%d)',q.0.5,n)}" ), mean_sd = list( characteristic = "Mean (N)", "Value (N={N}/{N_total})" = "{.sprintf_na('%1.3g (%d)',mean,n)}" ), skipped = list( characteristic = "(N)", "Value (N={N}/{N_total})" = "{.sprintf_na('— (%d)',n)}" ) ) # printing control the following options control missing values # produced by the .sprintf_na function: # getOption("tableone.missing","") # getOption("tableone.na","\u2014") gcd %>% compare_population( formula, layout = custom ) ``` ## Footer customisation * Additional information can be added to the default footer. Handy for acronyms: ```{r} gcd %>% compare_population( formula, footer_text = c( "IQR: Interquartile range; CI: Confidence interval", "Additional information could be supplied") ) ``` * or we can choose to hide the footer altogether: ```{r} old = options("tableone.hide_footer"=TRUE) # or we can choose to hide the footer altogether gcd %>% compare_population(formula) options(old) ```