Package 'avoncap' reference manual

Title:	AvonCap Study Analysis
Description:	A WIP set of functions allowing data load, wrangling of the AvonCap data set.
Authors:	Rob Challen [aut, cre]
Maintainer:	Rob Challen <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.0.9029
Built:	2025-02-01 05:59:55 UTC
Source:	https://github.com/bristol-vaccine-centre/avoncap

Clear data from the passthrough cache for complex or long running operations

Description

Clear data from the passthrough cache for complex or long running operations

Usage

.cache_clear(
  .cache = getOption("cache.dir", rappdirs::user_cache_dir(utils::packageName())),
  .prefix = ".*",
  interactive = TRUE
)
.cache_clear(
  .cache = getOption("cache.dir", rappdirs::user_cache_dir(utils::packageName())),
  .prefix = ".*",
  interactive = TRUE
)

Arguments

`.cache`	the location of the cache as a directory. May get its value from options("ggrrr.cache.dir") or the default value of rappdirs::user_cache_dir("ggrrr")
`.prefix`	a regular expression matching the prefix of the cached item, so that do selective clean up operations. defaults to everything.
`interactive`	suppress `⁠are you sure?⁠` warning with a FALSE value (defaults to TRUE)

Value

nothing. called for side effects

Delete stale files in a cache

Description

Staleness is determined by the number of days from 2am on the current day in the current time-zone. A item cached for only one day becomes stale at 2am the day after it is cached. The time is configurable and option(cache.time_day_starts = 0) would be midnight. Automated analysis using caches and updated data should ensure that analysis does not cross this time point otherwise it may end up using old data.

Usage

.cache_delete_stale(
  .cache = getOption("cache.dir", rappdirs::user_cache_dir(utils::packageName())),
  .prefix = ".*",
  .stale = Inf
)
.cache_delete_stale(
  .cache = getOption("cache.dir", rappdirs::user_cache_dir(utils::packageName())),
  .prefix = ".*",
  .stale = Inf
)

Arguments

`.cache`	the location of the cache as a directory. May get its value from options("cache.dir") or the default value of rappdirs::user_cache_dir("ggrrr")
`.prefix`	a name of the operation so that you can namespace the cached files and do selective clean up operations on them
`.stale`	the length of time in days to keep cached data before considering it as stale.

Value

nothing. called for side effects.

Download a file into a local cache.

Description

This function copies a remote file to a local cache once and makes sure it is reused.

Usage

.cache_download(
  url,
  ...,
  .nocache = getOption("cache.disable", default = FALSE),
  .cache = getOption("cache.download", rappdirs::user_cache_dir(utils::packageName())),
  .stale = Inf,
  .extn = NULL
)
.cache_download(
  url,
  ...,
  .nocache = getOption("cache.disable", default = FALSE),
  .cache = getOption("cache.download", rappdirs::user_cache_dir(utils::packageName())),
  .stale = Inf,
  .extn = NULL
)

Arguments

`url`	the url to download
`...`	ignored
`.nocache`	if set to TRUE all caching is disabled
`.cache`	the location of the downloaded files
`.stale`	how long to leave this file before replacing it.
`.extn`	the file name extension

Value

the path to the downloaded file

A simple pass-through cache for complex or long running operations

Description

executes expr and saves the output as an RDS file indexed by has of code in expr and the hash of input variables (which should contain any variable inputs)

Usage

.cached(
  .expr,
  ...,
  .nocache = getOption("cache.disable", default = FALSE),
  .cache = getOption("cache.dir", rappdirs::user_cache_dir(utils::packageName())),
  .prefix = "cached",
  .stale = Inf
)
.cached(
  .expr,
  ...,
  .nocache = getOption("cache.disable", default = FALSE),
  .cache = getOption("cache.dir", rappdirs::user_cache_dir(utils::packageName())),
  .prefix = "cached",
  .stale = Inf
)

Arguments

`.expr`	the code the output of which requires caching. Other than a return value this should not create side effects or change global variables.
`...`	inputs that the code in expr depends on and changes in which require the code re-running, Could be Sys.Date()
`.nocache`	an option to defeat the caching which can be set globally as options("cache.disable"=TRUE)
`.cache`	the location of the cache as a directory. May get its value from options("cache.dir") or the default value of rappdirs::user_cache_dir("ggrrr")
`.prefix`	a name of the operation so that you can namespace the cached files and do selective clean up operations on them
`.stale`	the length of time in days to keep cached data before considering it as stale. can also be set by options("cache.stale")

Value

the output of .expr which will usually be a value

Scans the input directory and returns csv or xlsx files in that directory

Description

Extracting metadata from the filename where present - particularly hospital, and year number

Usage

all_files()
all_files()

Value

a dataframe containing filename, path, date, hospital, and study_year fields

Sanitise AvonCap data columns

Description

AvonCap data has lots of columns which are named in a difficult to remember fashion, composed of data items that have enumerated values with no semantics. This makes displaying them difficult and any filtering done on the raw data inscrutable. Depending on the source of the data some different columns may be present due to differences in the NHS and UoB data sets. The redcap database has some options that may be checklists and some that are radio buttons, both of these end up with mysterious names in the data.

Usage

augment_data(x, ...)
augment_data(x, ...)

Arguments

x

the raw data from load_data()

...

Named arguments passed on to augment_generic

df: a data frame
...: unnamed parameters are a list of functions, named parameters are passed to those functions (if they match formal arguments).

Details

This function maps the data into a tidy dataframe with consistently named columns, and named factors where appropriate. If not present in the data the ethnicity

files Most of the sanitisation code is held in the zzz-avoncap-mappings.R file.

Value

a tracked dataframe with

Applies a set of functions to the whole dataframe

Description

This sequences, catches errors and allows parameters to be passed by name

Usage

augment_generic(df, ...)
augment_generic(df, ...)

Arguments

`df`	a data frame
`...`	unnamed parameters are a list of functions, named parameters are passed to those functions (if they match formal arguments).

Value

the altered df

Examples

fn1 = function(df,v) {df %>% dplyr::filter(cut=="Fair") %>% dplyr::mutate(x_col = color)}
fn2 = function(df,v) {df %>% dplyr::filter(color==v$color$J)}
df = ggplot2::diamonds %>% augment_generic(fn1, fn2)
fn1 = function(df,v) {df %>% dplyr::filter(cut=="Fair") %>% dplyr::mutate(x_col = color)}
fn2 = function(df,v) {df %>% dplyr::filter(color==v$color$J)}
df = ggplot2::diamonds %>% augment_generic(fn1, fn2)

Dodged bar and whiskers proportions

Description

This function plots a stacked bar of proportions for an input set of data

Usage

binomial_proportion_points(data, mapping, ..., width = 0.8, size = 0.5)
binomial_proportion_points(data, mapping, ..., width = 0.8, size = 0.5)

Arguments

`data`	the data
`mapping`	a aes mapping with at least `x` and `fill`. If facetting then `group` must contain the facet variable
`...`	passed to `geom_bar`
`width`	width of position dodge
`size`	the bar size

Value

a ggplot

Cut and label an integer valued quantity

Description

Deals with some annoying issues classifying integer data sets, such as ages, into groups. where you want to specify just the change over points as integers and clearly label the resulting ordered factor.

Usage

cut_integer(
  x,
  cut_points,
  glue = "{label}",
  lower_limit = -Inf,
  upper_limit = Inf,
  ...
)
cut_integer(
  x,
  cut_points,
  glue = "{label}",
  lower_limit = -Inf,
  upper_limit = Inf,
  ...
)

Arguments

`x`	a vector of integer valued numbers, e.g. ages, counts
`cut_points`	a vector of integer valued cut points which define the lower boundaries of conditions
`glue`	a glue spec that may be used to generate a label. It can use `low`, `high`, `next_low`, or `label` as values.
`lower_limit`	the minimum value we should include (this is inclusive for the bottom category) (default -Inf)
`upper_limit`	the maximum value we should include (this is also inclusive for the top category) (default Inf)
`...`	not used

Value

an ordered factor of the integer

Examples

cut_integer(stats::rbinom(20,20,0.5), c(5,10,15))
cut_integer(floor(stats::runif(100,-10,10)), cut_points = c(2,3,4,6), lower_limit=0, upper_limit=10)
cut_integer(stats::rbinom(20,20,0.5), c(5,10,15))
cut_integer(floor(stats::runif(100,-10,10)), cut_points = c(2,3,4,6), lower_limit=0, upper_limit=10)

default column naming mappings

Description

default column naming mappings

Usage

default_column_names(...)
default_column_names(...)

Arguments

...

additional named items to add

Value

a set of mappings

The avoncap denominator dataset

Description

The denominator is a time varying quantity

Usage

data(denom_by_age_by_day)
data(denom_by_age_by_day)

Format

A dataframe containing the following columns:

method (character) - estimation method. The default is "Campling 2019"
age (character) - the age category
date (date) - the date for which this estimate is valid
population (integer) - the esimtate of the population size for that age group on that day

No default value.

32592 rows and 4 columns

Create a counter in the event of repeated admissions

Description

This also will calculate a time interval between admissions. There is also a repeat admission instrument that this does not use.

Usage

derive_admission_episode(df, v)
derive_admission_episode(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Value

a dataframe

The aLRTD incidence paper classifications

Description

The 3 category classifications

Usage

derive_aLRTD_categories(df, v, ...)
derive_aLRTD_categories(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

aetiological:
- Confirmed SARS-CoV-2 - implies Infective
- No evidence SARS-CoV-2 - implies Infective but not confirmed as SARS-CoV-2
- Non-infective - presumed non infective
clinical presentation:
- Pneumonia - implies Infective
- NP-LRTI - implies Infective
- No evidence LRTI (include CRDE and HF)

Some cases do not get a clinical presentation in this. Typically they are people who have an infective cause, but LRTI and pneumonia have been excluded. These could be URTI and or incidental COVID cases.

Value

a dataframe

Create a flag for patients who have been given antivirals

Description

Create a flag for patients who have been given antivirals

Usage

derive_antiviral_status(df, v)
derive_antiviral_status(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Value

a dataframe

Identify patients who are in the BNSSG ICB based on their GP practice name

Description

Names are normalised by removing commonly mixed up components and

Usage

derive_catchment_status(df, v)
derive_catchment_status(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Value

a dataframe

Derive detailed vaccination status on admission

Description

Vaccination is deemed to have had effect if given > 14 days before admission for 1st dose or >7 days before admission for subsequent doses. This does not account for previous infection which is not in the data set.

Usage

derive_completed_vaccination_status(df, v, ...)
derive_completed_vaccination_status(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Categorical scores for continuous variables

Description

Typically used in regression models with non-linear effects over splines

Usage

derive_continuous_categories(df, v, ...)
derive_continuous_categories(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

Age category - UK demographic data ends at 85, and 65 key cut off in 5 year bands, so 10 year bands age categories end at 85 (N.b.) there is a more principled reason here. Boundaries fall approx 0.1, 0.2, 0.4, 0.6, 0.8 quantiles. Could merge first two groups but outcomes are usually different. Covid vaccination cohorts were in 5 year age groups, but vaccination prioirity was in these groups approximately.
Age of eligibility for vaccines: 65+ Age of pneumovax eligibility
CCI - 4 bands as defined in original Charleson paper: ** https://pubmed.ncbi.nlm.nih.gov/3558716/ ** in https://link.springer.com/article/10.1007/s10654-021-00802-z there is rationale given for not using the charleson score as a continuous value.
Alternate CCI - 0,1,2,3+ is also used as a grouping in the original charleson paper
Rockwood score - Completely independent versus dependent frailty levels.
CURB65 categorisation - As per derivation study (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1746657/): 0-1 consider home treatment; 2 consider admit as inpatient; 3-5 admit, consider ICU.

Value

a dataframe

Age and CURB score categories

Description

This should be consistent with AvonCAP age / CURB cateories.

Usage

derive_continuous_categories_pneumo(df, v, ...)
derive_continuous_categories_pneumo(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Determine if an admission is proven SARS-CoV-2 PCR positive

Description

SARS-CoV-2 PCR positive only lab confirmed diagnosis.

Usage

derive_covid_status(df, v, ...)
derive_covid_status(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

admission.covid_pcr_result:

based on fields: c19_adm_swab and covid_19_diagnosis
Patient reported, clinical diagnoses are assumed PCR negative (although possible in some cases they may not have been done).
Lateral flows done in hospital are counted as PCR negative.
negative admission swabs are counted as negative
NA signifies test not done.

admission.is_covid:

Binary confirmed or no-evidence.
PCR results count as confirmed,
Lateral flow results count as confirmed,
anything else is no evidence (includes negatives and test not done)

Value

a dataframe

Create 4 non exclusive diagnostic categories

Description

Pneumonia if one of:

Standard of care diagnosis of CAP (radiologically or clinically)
Empyema or abscess
Admission chest X-ray shows pneumonia

Usage

derive_diagnosis_categories(df, v)
derive_diagnosis_categories(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Details

NP-LRTI if:

Not pneumonia and Standard of care LTRI diagnosis

Exacerbation of CRDE:

Standard of care exacerbation COPD
Standard of care exacerbation Non-COPD
(N.B. may be pneumonia or NP-LRTI)

Heart failure:

Standard of care congestive heart failure.

Value

a dataframe

A simple vaccination status on admission as an ordered number of doses

Description

This does not account for previous infection which is not in the data set.

Usage

derive_effective_vaccination_status(df, v, ...)
derive_effective_vaccination_status(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Give a inferred Alpha, Delta or Omicron status based on time alone.

Description

This relies on date period during which we are very confidence that the only variants circulating are of a given type. These are quite conservative estimates based on the frequency of sequenced cases in the bristol area (according to the Sanger centre and to cases identified in the hospital testing)

Usage

derive_genomic_variant(df, v, ...)
derive_genomic_variant(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

Sanger centre data

Pre-alpha before 05 Dec 2020
Alpha between 13 Feb 2021 and 15 May 2021
Delta between 01 Jun 2021 and 07 Nov 2021
Omicron from 07 Feb 2022 to present

Value

a dataframe

Identify patients from the GP surgeries in linked primary care study

Description

Identify patients from the GP surgeries in linked primary care study

Usage

derive_gp_linkage(df, v)
derive_gp_linkage(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Value

a dataframe

Binary outcomes for haematology data

Description

Elevated troponin : > 18: 18ng/L is simply the 99th percentile value Beckman assay we use as quoted by the IFCC. We elected to not use sex-specific 99th percentile values although they are also quoted here and you could incorporate into your analysis. I am sure you are aware of the 4th Universal definition of MI that requires a rise or fall above the 99th percentile etc.

Usage

derive_haematology_categories(df, v, ...)
derive_haematology_categories(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Binary outcomes for hospital burden

Description

These outcomes were tested in the Delta vs Omicron severity paper and sensitivity analysis. These are only defined for COVID cases.

Usage

derive_hospital_burden_outcomes(df, v, ...)
derive_hospital_burden_outcomes(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

O2 requirement within 7 days (various cut-offs)
Any respiratory support in 7 days (various cut-offs)
LOS > X days in first 7 days (various cut-offs)

Value

a dataframe

Determine if an admission is due to an infective cause

Description

Infective admissions are defined as any of:

pneumonias
NP-LRTI
laboratory confirmed COVID diagnosis
admission swab COVID positive

Usage

derive_infective_classification(df, v)
derive_infective_classification(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Details

Infective admissions are excluded if:

Standard of care states non-infectious process
SOC non-LRTI (and none of the other categories above)

Any unknowns are defined as non-Infective

Value

a dataframe

Pneumococcal invasive status and binary test category

Description

Pneumococcal invasive status and binary test category

Usage

derive_invasive_status(df, ...)
derive_invasive_status(df, ...)

Arguments

`df`	the dataframe.
`...`	ignored

Value

a dataframe

Did the patient catch COVID in hospital

Description

Only relevant to SARS-CoV-2 PCR positive patient. Timing of positive test compared to admission: This relies on knowing dates and hence only works on the identifiable data sets,

Usage

derive_nosocomial_covid_status(df, v, ...)
derive_nosocomial_covid_status(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

Logic is:

Community if PCR result predates admission
Probably commuinity if PCR result within 7 days of admission
Probably nosocomial if 7-28 days after admission
Otherwise is it undefined.

Value

a dataframe

Identify patients who were admitted already prior to study entry

Description

Hospital acquired COVID is recorded explicitly in 2 places for some patients. A large difference between admission date and enrollment date (<21 days) is suggestive in other cases. The data is probably only collected in COVID cases so shoudl be treated with caution.

Usage

derive_nosocomial_status(df, v)
derive_nosocomial_status(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Value

a dataframe

Date columns

Description

Date columns

Usage

derive_pandemic_timings(date_col, prefix)
derive_pandemic_timings(date_col, prefix)

Arguments

`date_col`	the date column
`prefix`	a prefix for the columns to be added

Value

a derive_... style function to augment a data set containing date_col with a set of columns describing the timing.

Create a unique patient level id (if it does not already exist)

Description

The patient identifier is derived from the record number or the first record number (ensuring it matches) an entry in the record number. This deals with multiple admissions in the data set. In the patient identifiable NHS data this is the NHS number.

Usage

derive_patient_identifier(df, v)
derive_patient_identifier(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Value

a dataframe

Group pneumo serotypes according to e.g. vaccine coverage

Description

A range of useful serotype groups is defined in the list uad_groups. The default_pcv_map gives a set of mappings to group headings that gives the overall serotype distribution by vaccine.

Usage

derive_pcv_groupings(
  df,
  ...,
  pcv_map = uad_pcv_map,
  not_matched = "Other",
  col_name = "pneumo.pcv_group"
)
derive_pcv_groupings(
  df,
  ...,
  pcv_map = uad_pcv_map,
  not_matched = "Other",
  col_name = "pneumo.pcv_group"
)

Arguments

`df`	the normalised urine antigen data
`...`	ignored
`pcv_map`	a 2 column data frame mapping `group` to `uad_analysis`
`not_matched`	what to call the column of non-matched serotypes? Default is `Other`, but `⁠Non vaccine type⁠` might be preferred.
`col_name`	the target column name for the pcv grouping (defaults to `pneumo.pcv_group`)

Details

The logic employed in combining elements is:

any(result == "Unknown") ~ "Unknown"
any(result == "Positive") ~ "Positive"
all(result == "Negative") ~ "Negative"
TRUE ~ "Other"

Value

an augmented data frame with an additional column defined by col_name

Get vaccine coverage group for known serotype

Description

For the longitudinal oneumocococcal data, a range of useful serotype groups is defined in the list avoncap::serotype_data. The avoncap::serotype_pcv_map gives a set of mappings to (multiple) group headings that gives the overall serotype distribution by vaccine.

Usage

derive_phe_pcv_group(df, v, ...)
derive_phe_pcv_group(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Add in clinical syndrome indicator

Description

A list of presentations based on site which

LRTI
Meningitis
Effusion/Empyema
Septic arthritis
URTI
Other

Usage

derive_pneumo_clinical_syndrome(df, v, ...)
derive_pneumo_clinical_syndrome(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Make pneumo data compatible with AvonCAP

Description

Needed for:

derive_simpler_comorbidities
derive_pneumococcal_high_risk
derive_pneumococcal_risk_category

Usage

derive_pneumo_polyfill(df, ...)
derive_pneumo_polyfill(df, ...)

Arguments

`df`	the dataframe.
`...`	ignored

Value

a dataframe

Calculate UAD panel for test

Description

The panels are UAD1 for PCV13 serotypes, UAD2 for PPV23 serotypes.

Usage

derive_pneumo_uad_panel(df, ...)
derive_pneumo_uad_panel(df, ...)

Arguments

`df`	a pneumo serotype dataframe
`...`	ignored

Value

a dataframe with additional columns pneumo.uad1_panel_result, pneumo.uad2_panel_result, pneumo.non_uad_panel_result, pneumo.serotype_summary_result

Calculate summary status from UAD (or other serotype) panel results

Description

logic is defined in derive_pcv_groupings().

Usage

derive_pneumo_uad_status(df, ...)
derive_pneumo_uad_status(df, ...)

Arguments

`df`	a pneumo serotype dataframe
`...`	ignored

Value

a dataframe with additional columns pneumo.uad1_panel_result, pneumo.uad2_panel_result, pneumo.non_uad_panel_result, pneumo.serotype_summary_result

The pneumococcal incidence diagnostic classifications

Description

The 4 category disjoint classification.

Usage

derive_pneumococcal_categories(df, v, ...)
derive_pneumococcal_categories(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

pneumo.presentation_class:
- CAP+/RAD+ - radiologically proved pneumonia
- CAP+/RAD- - pneumonia without x-ray confirmation
- NP-LRTI - non-pneumonic lower respiratory tract infection
- No evidence LRTI - believed to be non-infective at admission, this last group is usually discarded from analysis, however it only really describes people without a clinical diagnosis of LRTI on admission. There could still be undiagnosed infection there, and some of these patients have COVID (possibly without lower respiratory symptoms?).

Value

a dataframe

Determine if patient is in a high pneumococcal risk group

Description

High pneumococcal risk defined if any of the following:

over 65 years old
other pneumococcal risks
comorbid copd
interstitial lung disease
cystic fibrosis
hypertension
CCF
ischaemic heart disease
chronic kidney disease
chronic liver disease
diabetes
asthmatic with immunodeficiency
on immunosupression

Usage

derive_pneumococcal_high_risk(df, v, ...)
derive_pneumococcal_high_risk(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Determine pneumococcal risk group

Description

Original algorithm from B1851202 SAP defines a 3 class risk group:

Usage

derive_pneumococcal_risk_category(df, v, ...)
derive_pneumococcal_risk_category(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

High-risk (immunocompromised)

Asplenia - not supported
Cancer/Malignancy, Hematologic - OK
Cancer/Malignancy, Solid Tumor - OK
Chronic Kidney Disease - OK
Human Immunodeficiency Virus (HIV) – AIDS - OK
Human Immunodeficiency Virus (HIV) – No AIDS - OK
Immunodeficiency - OK
Immunosuppressant Drug Therapy - OK
Organ Transplantation - OK
Multiple Myeloma - not supported

At Risk (immunocompetent)

Asthma - OK
Alcoholism - OK
Celiac Disease - not supported
Chronic Liver Disease without Hepatic Failure - OK
Chronic Liver Disease with Hepatic Failure - OK
Chronic Obstructive Pulmonary Disease - OK
Cochlear Implant - not supported
Congestive Heart Failure - OK
Coronary Artery Disease (CAD) - OK
Chronic Neurologic Diseases - OK
Coagulation factor replacement therapy - not supported
CSF Leak - not supported
Diabetes Treated with Medication - OK
Down syndrome - OK
Institutionalized in nursing home or LTC facility (Nursing home or long-term care facility for those with disability or dependency on subject characteristics/risk determinants eCRF page) - OK
Occupational risk with exposure to metal fumes - OK
Other Chronic Heart Disease - OK
Other Chronic Lung Disease - OK
Other pneumococcal disease risk factors - OK
Previous Invasive Pneumococcal Disease - not supported
Tobacco smoking (Tobacco/E-Cigarettes) - OK

Anything else is low risk

Value

a dataframe

Polyfill data

Description

Some basic context to allow comparison to ED data.

Usage

derive_polyfill_central(df, v, ...)
derive_polyfill_central(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

All of the patients admitted

Value

a dataframe

Polyfill ED data

Description

The ED data has some different fields from the main avoncap data.

Usage

derive_polyfill_ed(df, v, ...)
derive_polyfill_ed(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

It is missing an admission cxr summary field needed to calculate pneumonia
It has a fixed admission route of "A&E" (i.e. ED to non UK people)
None of the patients admitted
Hospital admission length of stay is zero

Value

a dataframe

Create presumed diagnostic categories

Description

Pneumonia if one of:

Initial diagnosis of CAP (supported by initial radiology or clinically)
Empyema or abscess

Usage

derive_presumed_diagnosis_categories(df, v)
derive_presumed_diagnosis_categories(df, v)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`

Details

Presumed clinical presentation:

Pneumonia - implies Infective
NP-LRTI - implies Infective
No evidence LRTI (include CRDE and HF)

Value

a dataframe

Calculate a QCOVID2 score from AvonCap data source

Description

uses inbuilt imd_to_townsend map. This implements a cut down version of the QCovid2 score depending on what data is available.

Usage

derive_qcovid(df, v = avoncap_df %>% get_value_sets())
derive_qcovid(df, v = avoncap_df %>% get_value_sets())

Arguments

`df`	a normalised avoncap data source
`v`	a value set

Value

the same dataframe with additional columns,

qcovid2.log_hazard, covid2.hazard_ratio: a log hazard rate for the QCOVID2 score where missing data is substituted with the reference value for the QCOVID2 population.
qcovid2.log_comorbid_hazard, qcovid2.comorbid_hazard_ratio: a log hazard rate for the comorbid conditions and not including age and BMI.

Split a continuous variable into quintiles

Description

Split a continuous variable into quintiles

Usage

derive_quintile_category(col, labels = c("1-short", "2", "3", "4", "5-long"))
derive_quintile_category(col, labels = c("1-short", "2", "3", "4", "5-long"))

Arguments

`col`	the continuous data column that is to be categorised by quintile.
`labels`	the category labels

Value

a derive_... style function that augments a data set with col xxx with col xxx_quintile containing the quintiles

Binary outcomes for severe disease

Description

Confirmed death within 30 days (subject to potential censoring)
Confirmed death within 1 year (subject to potential censoring). The date of censoring depends on when the mortality data was updated. Currently this is 04 Oct 2024
Confirmed death (any length follow up)
Any ICU admission

Usage

derive_severe_disease_outcomes(df, v, ...)
derive_severe_disease_outcomes(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

described in aLRTD paper. These outcomes are

Value

a dataframe

Rationalise some of the more detailed comorbidities

Description

and generate some summary values

Usage

derive_simpler_comorbidities(df, v, ...)
derive_simpler_comorbidities(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

simple DM without insulin dependence
Solid / Haematological / Any cancer present binary indicators
any chronic resp dx: i.e. any of asthma, bronchiectasis, chronic pleural disease, COPD, interstitial lung dx, cyctic fibrosis, other chronic resp dx
any chronic heart disease: pulmonary htn, CCF, IHD, previous MI, congential heart dx, hypertension, AF, other arrythmia, other heart dx, other other heart dx
Stroke or TIA binary
Any immune compromise binary (immunodeficient or on immune suppressants)

Value

a dataframe

Survival outcomes

Description

Expects as days since admission:

survival.length_of_stay - length of stay until discharge or death (NA if still in hosptial),
survival.uncensored_time_to_death - time until death (NA if alive at last obs),
survival.last_observed_event - last time patient observed alive.

Usage

derive_survival_censoring(df, v, ...)
derive_survival_censoring(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

Calculates

a 30 day survival duration and censoring status for survfit
a 1 year survival duration and censoring status for survfit
Hospital length of stay and censoring status for survfit
Categorical length of stay and 30 day survival 0-3, 4-6, 7-13, 14-29, gte 30

Survival data will be of the form:

survival.30_day_death_xxx, survival.1_yr_death_xxx, survival.30_day_discharge

xxx_time: for this is the follow up time to event in days (max 30 or 365).

xxx_event: The event type indicator

0 = alive at event (censored),
1 = dead.

or for length of stay:

0 = still inpatient / died (censored),
1 = discharged from hospital

A survival model will be of the form:

survival::Surv(time = xxx_time, event=xxx_event) ~ ...

Value

a dataframe

Survival analysis times

Description

Fixes a data issue with length of stay and survival duration being filled in across 2 columns. and missing last observation dates so that we can calculate survival censoring consistently in other data sets.

Usage

derive_survival_times_avoncap(df, v, ...)
derive_survival_times_avoncap(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

Calculates:

A consistent length of stay - shortest of length of stay and 30 day and 1 yr survival duration
A consistent uncensored time to death - shortest of 30 day and 1 yr survival duration
A consistent time to last observation

Value

a dataframe

Survival analysis times

Description

Usage

derive_survival_times_pneumo(df, v, ...)
derive_survival_times_pneumo(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

Calculates:

A consistent length of stay - shortest of length of stay and 30 day and 1 yr survival duration
A consistent uncensored time to death - shortest of 30 day and 1 yr survival duration
A consistent time to last observation

Value

a dataframe

Derived data function template

Description

Derived data function template

Usage

derive_template(df, v, ...)
derive_template(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Derive times from vaccination to symptom onset

Description

If symptom duration is not given it is assumed to be zero.

Usage

derive_vaccination_timings(df, v, ...)
derive_vaccination_timings(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

Deprecated - Vaccine combinations are less relevant now

Description

There are too many potential combinations with 4th, 5th and sixth dose to make this useful.

Usage

derive_vaccine_combinations(df, v, ...)
derive_vaccine_combinations(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Value

a dataframe

determine WHO outcome score

Description

Scores 0-3 are for community cases.

Usage

derive_WHO_outcome_score(df, v, ...)
derive_WHO_outcome_score(df, v, ...)

Arguments

`df`	the dataframe.
`v`	the value set. usually precomputed by the augment framework the value set can be explicitly supplied with `v = get_value_sets(df)`
`...`	ignored

Details

We generally can't tell the difference between 7 and 8.

4: Hospitalised; no oxygen therapy
5: Hospitalised; oxygen by mask or nasal prongs
6: Hospitalised; oxygen by NIV or high flow
7: Intubation and mechanical ventilation, pO2/FiO2 >= 150 or SpO2/FiO2 >= 200
8: Mechanical ventilation pO2/FIO2 <150 (SpO2/FiO2 <200) or vasopressors
9: Mechanical ventilation pO2/FiO2 <150 and vasopressors, dialysis, or ECMO
10: Dead

Value

a dataframe

Get provenance of data column

Description

When a data set is normalised or augmented the original column names are stored as metadata. This helps us determine how a particular item was created. In future this will be useful for documentation.

Usage

extract_dependencies(data, col, original = TRUE)
extract_dependencies(data, col, original = TRUE)

Arguments

`data`	the dataframe
`col`	the column as a symbol
`original`	map the names to the original column names from the data. If this is false the function returns a list of current normalised column names.

Value

a named list of dependencies and original column names for a given column

Get the transformed columns from original field names

Description

Get the transformed columns from original field names

Usage

find_new_field_names(normalised, fields)
find_new_field_names(normalised, fields)

Arguments

`normalised`	the transformed data set.
`fields`	a vector of field names

Value

a named list mapping original to new columns

Frameworks

Description

The list of validation, normalisation and augmentation frameworks. There should be one validation per data set. The may be mulitple normalisations and augmentations depending on the aspect of the data we are extracting (e.g. re-nesting flattened data.)

Get a value set list of a dataframe

Description

This function examines a dataframe and returns a list of the columns with sub-lists as all the options for factors. This provides programmatic access (and automcomplete) to the values available in a dataframe, and throws and early error if we try and access data by a variable that does not exist.

Usage

get_value_sets(df)
get_value_sets(df)

Arguments

`df`	a dataframe to examine

Value

a list of lists with the column name and the factor levels as list, as a ⁠checked list⁠.

GP surgeries in the Bristol ICB area

Description

The denominator relates only to patients coming from these GP surgeries

Usage

data(icb_surgeries)
data(icb_surgeries)

Format

A dataframe containing the following columns:

code - an official ODS code for the GP surgery
name - the official surgery name.

82 rows and 2 columns

High level IMD to Townsend score map

Description

A high level mapping from IMD to Townsend score This is inaccurate as townsend score

Details

A data frame with 10 rows and 2 columns:

imd_decile: The IMD
mean_townsend: the average townsend score for this IMD

...

Source

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/845345/File_7_-_All_IoD2019_Scores__Ranks__Deciles_and_Population_Denominators_3.csv

https://s3-eu-west-1.amazonaws.com/statistics.digitalresources.jisc.ac.uk/dkan/files/Townsend_Deprivation_Scores/Scores/Scores-%202011%20UK%20LSOA.csv

Locate the input directory

Description

Locate the input directory

Usage

input(...)
input(...)

Arguments

...

the sub paths within the input directory

Value

a path to the input directory and sub paths if provided

Examples


# devtools::load_all()
try({
  avoncap::set_input("~/Data/avoncap")
  avoncap::input("nhs-extract")

  avoncap::all_files()


  # exact match on filename column of all_data()
  avoncap::most_recent_files("AvonCAPLRTDCentralDa")


  # or matches by lower case startWith on directory
  avoncap::most_recent_files("nhs-extract","deltave")


  avoncap::most_recent_files("metadata")

  avoncap::valid_inputs()
})
# devtools::load_all()
try({
  avoncap::set_input("~/Data/avoncap")
  avoncap::input("nhs-extract")

  avoncap::all_files()


  # exact match on filename column of all_data()
  avoncap::most_recent_files("AvonCAPLRTDCentralDa")


  # or matches by lower case startWith on directory
  avoncap::most_recent_files("nhs-extract","deltave")


  avoncap::most_recent_files("metadata")

  avoncap::valid_inputs()
})

Key dates:

Description

A list of key dates:

mortality_updated - the last time the NHS mortality data was extracted and added to AvonCAP
min_alpha - earliest observation of the alpha variant
max_wuhan - last observation of the wuhan variant
min_delta - earliest observation of the delta variant
max_alpha - last observation of the alpha variant
min_omicron - earliest observation of the omicron variant
max_delta - last observation of the delta variant

The default catchment population for AvonCAP is limited to the Bristol, North Somerset and South Gloucestershire Integrated Care Board (BNSSG ICB). This list is the list of GP surgeries considered part of the denominator.

Details

code - the NHS ODS organisational code of the practice.
name - the official name of the practice

Faceted Kaplan-Meier plot

Description

Faceted Kaplan-Meier plot

Usage

km_plot(
  df,
  coxmodel,
  facet = NULL,
  ...,
  maxtime = NULL,
  ylab = if (!invert) "surviving (%)" else "affected (%)",
  xlab = "time (days)",
  facetlab = NULL,
  ylim = (if (invert) c(0, NA) else c(NA, 100)),
  n_breaks = 5,
  heights = c(10, 1),
  invert = FALSE,
  show_label = FALSE,
  show_legend = TRUE
)
km_plot(
  df,
  coxmodel,
  facet = NULL,
  ...,
  maxtime = NULL,
  ylab = if (!invert) "surviving (%)" else "affected (%)",
  xlab = "time (days)",
  facetlab = NULL,
  ylim = (if (invert) c(0, NA) else c(NA, 100)),
  n_breaks = 5,
  heights = c(10, 1),
  invert = FALSE,
  show_label = FALSE,
  show_legend = TRUE
)

Arguments

`df`	the data
`coxmodel`	the cox model output of survival::coxph from the data
`facet`	the division to highlight in the KM strata. Defaults to first term on the lhs of the cox model formula
`...`	Named arguments passed on to `survival::survfit` `formula` either a formula or a previously fitted model `...` other arguments to the specific method
`maxtime`	the longest x value to plot (optional)
`ylab`	the y axis label
`xlab`	the x axis label
`facetlab`	a label to add as a facet title
`ylim`	the range to show on the KM plot
`n_breaks`	number of x axis breaks to display this also determines the timing and number of "at risk" counts to display.
`heights`	the relative height between the KM plot and the "at risk" table
`invert`	reverse survival statistics to count number of affected
`show_label`	show the label on the at risk table ( which is somewhat redundant as items are coloured)
`show_legend`	show the legend for the strata. (This is sometimes redundant if the at risk table is labelled)

Value

a ggplot patchwork.

Examples


cox = survival::coxph(survival::Surv(time, status) ~ trt + celltype + karno +
  diagtime + age + prior , data = survival::veteran)

km_plot(survival::veteran, cox)
km_plot(survival::veteran, cox, facet = 1)

km_plot(survival::veteran, cox, "celltype", show_label=TRUE) &
   ggplot2::theme(legend.position="bottom")

km_plot(survival::veteran, cox, "trt", show_label=TRUE) &
   ggplot2::theme(legend.position="bottom")

cox = survival::coxph(survival::Surv(time, status) ~ trt + celltype + karno +
  diagtime + age + prior , data = survival::veteran)

km_plot(survival::veteran, cox)
km_plot(survival::veteran, cox, facet = 1)

km_plot(survival::veteran, cox, "celltype", show_label=TRUE) &
   ggplot2::theme(legend.position="bottom")

km_plot(survival::veteran, cox, "trt", show_label=TRUE) &
   ggplot2::theme(legend.position="bottom")

Load data and check structure

Description

Loads the AvonCap data from a set of csv files, which may optionally be qualified by site ⁠('BRI' or 'NBT')⁠ and database year ⁠('y1', 'y2', 'y3')⁠ as part of the file name. This selects the most recent files earlier than the reproduce_at date and detects whether they are in a set of files.

Usage

load_data(
  type,
  subtype = NULL,
  reproduce_at = as.Date(getOption("reproduce.at", default = Sys.Date())),
  merge = TRUE,
  ...
)
load_data(
  type,
  subtype = NULL,
  reproduce_at = as.Date(getOption("reproduce.at", default = Sys.Date())),
  merge = TRUE,
  ...
)

Arguments

`type`	the file category see `valid_inputs()` for current list in input directory
`subtype`	the subtype from `valid_inputs()`
`reproduce_at`	the date at which to cut off newer data files
`merge`	setting to `TRUE` forces multiple files be merged into a single data frame by losing mismatching columns.
`...`	passed to `cached` may specifically want to use '.nocache=TRUE“

Details

The files are loaded as csv as checked that files have (A) the same columns, (B) the same type (or are empty) (C) have any major parse issues. It then merges the files into a single dataframe, if possible, otherwise it will return the individually loaded files as a list of dataframes.

Value

either a list of dataframes or a single merged dataframe

Examples

try(load_data("nhs-extract","deltave"))
try(load_data("nhs-extract","deltave"))

Core avoncap normalisation

Description

record_number -> admin.record_number (name)
what_was_the_first_surveil -> admin.first_record_number (name)
ac_study_number -> admin.consented_record_number (study_id)
nhs_number -> admin.patient_identifier (ppi)
duplicate -> admin.duplicate (yesno)
enrollment_date -> admin.enrollment_date (date)
admission_type -> admission.admission_route (list)
study_year -> admin.study_year (name)
file -> admin.data_file (name)
week_number -> admin.week_number (name)
c19_diagnosis -> diagnosis.standard_of_care_COVID_diagnosis (list)
clinical_radio_diagnosis -> diagnosis.clinical_or_radiological_LRTI_or_pneumonia (yesno)
c19_adm_swab -> diagnosis.admission_swab (list)
c19_test_type -> diagnosis.test_type (list)
qualifying_symptoms_signs -> diagnosis.qualifying_symptoms_signs (name)
cc_critieria -> diagnosis.meets_case_control_criteria (yesno)
cc_pos_date -> diagnosis.first_COVID_positive_swab_date (date)
gender -> demog.gender (list)
age_at_admission -> demog.age (double)
age_march -> demog.age_in_march_2021 (double)
imd -> demog.imd_decile (name)
gp_practice -> admin.gp_practice_old (name)
gp_practice_drop_down -> admin.gp_practice (list)
smoking -> demog.smoker (list)
ethnicity2 -> demog.ethnicity (list)
care_home -> demog.care_home_resident (yesno)
hapcovid_screening -> admission.non_lrtd_hospital_acquired_covid (yesno)
hospital_covid -> admission.hospital_acquired_covid (yesno)
drugs -> demog.no_drug_abuse, demog.alcohol_abuse, demog.ivdu_abuse, demog.marijuana_abuse, demog.other_inhaled_drug_abuse (checkboxes)
vaping -> demog.vaping (list)
alc_units -> demog.units_of_alcohol (name)
np_swab -> admin.np_swab_taken_1 (list)
adm_np_type -> admin.np_swab_site_1 (list)
np_date -> admin.np_swab_date_1 (date)
days_adm_npswab -> admin.np_swab_day_since_admission (double)
np_swab_2 -> admin.np_swab_taken_2 (list)
adm_np_type_2 -> admin.np_swab_site_2 (list)
np_date_2 -> admin.np_swab_date_2 (date)
np_swab_3 -> admin.np_swab_taken_3 (list)
adm_np_type_3 -> admin.np_swab_site_3 (list)
np_date_3 -> admin.np_swab_date_3 (date)
saliva -> admin.saliva_sample_taken (list)
saliva_date -> admin.saliva_sample_date (date)
days_adm_saliva -> admin.saliva_sample_day_since_admission (double)
sputum -> admin.sputum_sample_taken (list)
sputum_date -> admin.sputum_sample_date (date)
days_adm_sputum -> admin.sputum_sample_day_since_admission (double)
pt_ad_ur -> admin.urine_sample_needed (yesno)
adm_ur_taken -> admin.urine_sample_taken (list)
nourine_reason -> admin.urine_sample_failure_reason (list)
adm_np_type_2 -> admin.urine_sample_site (list)
adm_ur_date -> admin.urine_sample_date (date)
days_adm_urine -> admin.urine_sample_day_since_admission (double)
adm_serum_tak -> admin.serum_sample_taken (list)
adm_seru_date -> admin.serum_sample_date (date)
days_adm_serum -> admin.serum_sample_day_since_admission (double)
contraindication -> vaccination.covid_vaccine_contraindicated (yesno)
covid19_vax -> vaccination.covid_vaccination (list)
covidvax_date -> vaccination.first_dose_date (date)
covidvax_dose_2 -> vaccination.second_dose_date (date)
covidvax_dose_3 -> vaccination.third_dose_date (date)
covidvax_dose_4 -> vaccination.fourth_dose_date (date)
covidvax_dose_5 -> vaccination.fifth_dose_date (date)
covidvax_dose_6 -> vaccination.sixth_dose_date (date)
brand_of_covid19_vaccinati -> vaccination.first_dose_brand (list)
covid19vax_brand_2 -> vaccination.second_dose_brand (list)
covid19vax_brand_3 -> vaccination.third_dose_brand (list)
covid19vax_brand_4 -> vaccination.fourth_dose_brand (list)
covid19vax_brand_5 -> vaccination.fifth_dose_brand (list)
covid19vax_brand_6 -> vaccination.sixth_dose_brand (list)
c19vaxd1_adm -> admission.time_since_first_vaccine_dose (name)
c19vaxd2_adm -> admission.time_since_second_vaccine_dose (name)
c19vaxd3_adm -> admission.time_since_third_vaccine_dose (name)
c19vaxd4_adm -> admission.time_since_fourth_vaccine_dose (name)
c19vax5_adm -> admission.time_since_fifth_vaccine_dose (name)
c19vax6_adm -> admission.time_since_sixth_vaccine_dose (name)
flu_date -> vaccination.last_flu_dose_date (date)
fluvax_adm_d1 -> admission.time_since_last_flu_vaccine_dose (name)
ppv23_date -> vaccination.last_pneumococcal_dose_date (date)
ppv23vax_adm_d -> admission.time_since_last_pneumococcal_vaccine_dose (name)
c19_variant -> genomic.variant (variant)
year -> admission.year (double)
study_week -> admission.study_week (double)
admission_date -> admission.date (date)
hospital -> admin.hospital, toupper (text_to_factor)
adm_diagnosis -> admission.presumed_CAP_radiologically_confirmed, admission.presumed_CAP_clinically_confirmed, admission.presumed_CAP_no_radiology, admission.presumed_LRTI, admission.presumed_Empyema_or_abscess, admission.presumed_exacerbation_COPD, admission.presumed_exacerbation_non_COPD, admission.presumed_congestive_heart_failure, admission.presumed_non_infectious_process, admission.presumed_non_LRTI (checkboxes)
ics -> admission.on_inhaled_corticosteroids (yesno)
immsup -> admission.on_immunosuppression (yesno)
psi_class -> admission.pneumonia_severity_index_class (list)
crb_test_mai -> admission.curb_65_severity_score (list)
news_2_total -> admission.news2_score (name)
pulse_ox -> admission.oximetry (name)
rr -> admission.respiratory_rate (name)
fio2 -> admission.max_oxygen (name)
systolic_bp -> admission.systolic_bp (name)
diastolic_bp -> admission.diastolic_bp (name)
hr -> admission.heart_rate (name)
temperature -> admission.temperature (list)
symptom_days_preadmit -> admission.duration_symptoms (double)
previous_infection -> admission.previous_covid_infection (yesno_unknown)
previousinfection_date -> admission.previous_covid_infection_date (date)
c19d_preadm -> admission.time_since_covid_diagnosis (name)
rockwood -> admission.rockwood_score (name)
cci_total_score -> admission.charlson_comorbidity_index (name)
height -> admission.height (name)
weight -> admission.weight (name)
bmi -> admission.BMI (double)
first_radio -> admission.cxr_normal, admission.cxr_pneumonia, admission.cxr_heart_failure, admission.cxr_pleural_effusion, admission.cxr_covid_changes, admission.cxr_other (checkboxes)
c19_peep -> day_7.max_peep (name)
c19_hospadm -> day_7.length_of_stay (list)
c17_high -> day_7.max_care_level (list)
c19icuon -> day_7.still_on_icu (yesno)
c19_icudays -> day_7.icu_length_of_stay (list)
c19_vent -> day_7.max_ventilation_level (list)
c19_ox -> day_7.max_o2_level (list)
c19_ionotropes -> day_7.ionotropes_needed (yesno)
c19_complication -> day_7.PE, day_7.DVT, day_7.ARF, day_7.NSTEMI, day_7.STEMI, day_7.cardiac_failure, day_7.new_AF, day_7.new_other_arrythmia, day_7.inpatient_fall, day_7.other_complication, day_7.no_complication (checkboxes)
c19_death7d -> day_7.death (yesno)
c19_meds -> treatment.dexamethasone, treatment.remdesevir, treatment.tocilizumab, treatment.sarilumab, treatment.in_drug_trial, treatment.no_drug_treatment, treatment.sotrovimab (checkboxes)
hospital_length_of_stay -> outcome.length_of_stay, floor (integer)
survival_days -> outcome.survival_duration, round (integer)
ip_death -> outcome.inpatient_death (yesno)
days_in_icu -> outcome.icu_duration (double)
did_the_patient_have_respi -> outcome.respiratory_support_needed (yesno)
number_of_days_of_ventilat -> outcome.ventilator_duration (double)
ett_days -> outcome.endotracheal_tube_duration (double)
renal_replacement_therapy -> outcome.renal_support_duration (double)
complications -> outcome.acute_renal_failure, outcome.liver_dysfunction, outcome.hospital_acquired_infection, outcome.acute_respiratory_distress_syndrome, outcome.NSTEMI, outcome.STEMI, outcome.new_AF, outcome.new_other_arrhthmia, outcome.stroke, outcome.DVT, outcome.PE, outcome.heart_failure, outcome.fall_in_hospital, outcome.reduced_mobility, outcome.increasing_care_requirement, outcome.no_complications (checkboxes)
ventilatory_support -> outcome.highest_level_ventilatory_support (list)
did_the_patient_receive_ec -> outcome.received_ecmo (yesno)
inotropic_support_required -> outcome.received_ionotropes (yesno_unknown)
lrtd_30d_outcome -> outcome.functional_status (list)
survive_1yr -> outcome.one_year_survival (yesno)
survival_1yr_days -> outcome.one_year_survival_duration (integer)
yr_survival_complete -> outcome.one_year_survival_complete (list)
fever2 -> symptom.abnormal_temperature (yesno)
pleurtic_cp -> symptom.pleuritic_chest_pain (yesno)
cough2 -> symptom.cough (yesno)
sput_prod -> symptom.productive_sputum (yesno)
dyspnoea -> symptom.dyspnoea (yesno)
tachypnoea2 -> symptom.tachypnoea (yesno)
confusion -> symptom.confusion (yesno)
anosmia -> symptom.anosmia (yesno_unknown)
ageusia -> symptom.ageusia (yesno_unknown)
dysgeusia -> symptom.dysguesia (yesno_unknown)
fever -> symptom.fever (yesno_unknown)
hypothermia -> symptom.hypothermia (yesno_unknown)
chills -> symptom.chills (yesno_unknown)
headache -> symptom.headache (yesno_unknown)
malaise -> symptom.malaise (yesno_unknown)
wheeze -> symptom.wheeze (yesno_unknown)
myalgia -> symptom.myalgia (yesno_unknown)
worse_confusion -> symptom.worsening_confusion (yesno_unknown)
general_det -> symptom.general_deterioration (yesno_unknown)
ox_on_admission -> symptom.oxygen_required_on_admission (yesno_unknown)
resp_disease -> comorbid.no_resp_dx, comorbid.copd, comorbid.asthma, comorbid.resp_other (checkboxes)
other_respiratory_disease -> comorbid.bronchiectasis, comorbid.interstitial_lung_dx, comorbid.cystic_fibrosis, comorbid.pulmonary_hypertension, comorbid.chronic_pleural_dx, comorbid.other_chronic_resp_dx (checkboxes)
chd -> comorbid.no_heart_dx, comorbid.ccf, comorbid.ihd, comorbid.hypertension, comorbid.other_heart_dx (checkboxes)
mi -> comorbid.previous_mi (yesno)
other_chd -> comorbid.congenital_heart_dx, comorbid.af, comorbid.other_arrythmia, comorbid.pacemaker, comorbid.valvular_heart_dx, comorbid.other_other_heart_dx (checkboxes)
diabetes -> comorbid.diabetes (list)
dm_meds -> comorbid.diabetes_medications (list)
neurological_disease -> comorbid.neuro_other, comorbid.cva, comorbid.tia, comorbid.hemiplegia, comorbid.paraplegia, comorbid.no_neuro_dx (checkboxes)
dementia -> comorbid.no_dementia, comorbid.dementia, comorbid.cognitive_impairment (checkboxes)
cancer -> comorbid.solid_cancer (list)
haem_malig -> comorbid.no_haemotological_cancer, comorbid.leukaemia, comorbid.lymphoma (checkboxes)
ckd -> comorbid.ckd (list)
liver_disease -> comorbid.liver_disease (list)
gastric_ulcers -> comorbid.gastric_ulcers (yesno)
pvd -> comorbid.periph_vasc_dx (yesno)
ctd -> comorbid.connective_tissue_dx (yesno)
immunodeficiency -> comorbid.immunodeficiency (yesno)
other_pn_disease -> comorbid.other_pneumococcal_risks (yesno)
transplant -> comorbid.transplant_recipient (yesno)
pregnancy -> comorbid.pregnancy (list)
hiv -> comorbid.no_HIV, comorbid.HIV, comorbid.AIDS (checkboxes)
final_soc_lrtd_diagnosis -> diagnosis.SOC_CAP_radiologically_confirmed, diagnosis.SOC_CAP_clinically_confirmed, diagnosis.SOC_CAP_no_radiology, diagnosis.SOC_LRTI, diagnosis.SOC_Empyema_or_abscess, diagnosis.SOC_exacerbation_COPD, diagnosis.SOC_exacerbation_non_COPD, diagnosis.SOC_congestive_heart_failure, diagnosis.SOC_non_infectious_process, diagnosis.SOC_non_LRTI (checkboxes)
covid_19_diagnosis -> diagnosis.covid_19_diagnosis (list)
ppv23 -> vaccination.pneumovax (list)
flu_vaccine -> vaccination.influenza_vaccination (list)
abx_14d_prior -> admission.pre_admission_antibiotics_given (yesno_unknown)
antibiotic_used -> admission.pre_admission_antibiotic (checkboxes_to_nested_list)
antiplatelets -> admission.antiplatelet_therapy (list)
anticoagulants -> admission.anticoagulant_therapy (list)
statins -> admission.cholesterol_lowering_therapy (list)
hypertensives -> admission.antihypertensive_therapy (list)
antiviral_14d_prior -> admission.pre_admission_antiviral (checkboxes_to_nested_list)

Usage

map_avoncap_central()
map_avoncap_central()

Value

a list

Core avoncap consent

Description

consented -> admin.consented (list)
ppc -> admin.pp_consented (list)
withdrawal -> admin.withdrawal (yesno)
consent_urine -> admin.consent_for_urine (yesno)
consent_blood -> admin.consent_for_blood (yesno)
consent_resp_samples1 -> admin.consent_for_respiratory_samples (yesno)

Usage

map_avoncap_consent()
map_avoncap_consent()

Value

a list

Avoncap ED normalisation

Description

All the ED data is also mapped using the map_avoncap_central() list as it si quite similar

Usage

map_avoncap_ed()
map_avoncap_ed()

Details

ed_hours -> outcome.emergency_dept_length_of_stay (name)
ed_reattendance -> admin.ed_episodes_in_last_30_days (name)
hosp_adm_30d -> outcome.admitted_within_30_days (yesno)
hosp_adm_7d -> outcome.admitted_within_7_days (yesno)
home_d_1 -> outcome.days_since_last_ed_episode (name)
radiology_result_1___2 -> radio.consistent_with_pneumonia_1 (yesno)
radiology_result_2___2 -> radio.consistent_with_pneumonia_2 (yesno)

Value

a list

ED consent

Description

consented -> admin.consented (list)
ppc -> admin.pp_consented (list)
withdrawal -> admin.withdrawal (yesno)
consent_urine -> admin.consent_for_urine (yesno)
consent_blood -> admin.consent_for_blood (yesno)
consent_resp_samples1 -> admin.consent_for_respiratory_samples (yesno)

Usage

map_avoncap_ed_consent()
map_avoncap_ed_consent()

Value

a list

Normalise the avoncap data haematology data

Description

record_number -> admin.record_number (name)
ac_study_number -> admin.consented_record_number (study_id)
ph_7_35 -> haem.blood_gas_ph (double)
glucose -> haem.glucose (double)
albumin -> haem.albumin (double)
wcc -> haem.white_cell_count (double)
eos -> haem.eosinophils (double)
hb -> haem.haemoglobin (double)
haematocrit -> haem.haemotocrit (double)
pmn -> haem.neutrophils (double)
lymphocytes -> haem.lymphocytes (double)
crp -> haem.crp (double)
na_result -> haem.sodium (double)
ur_result -> haem.urea (double)
egfr -> haem.egfr (double)
sars_cov2_antigen -> haem.sars_cov2_antigen (trunc_double)
ferritin -> haem.ferritin (double)
troponin -> haem.troponin (double)
nt_probnp -> haem.pro_bnp (double)
d_dimer -> haem.d_dimer (double)
patient_blood_group -> haem.blood_group (list)

Usage

map_avoncap_haem()
map_avoncap_haem()

Value

a list

Normalise the avoncap data microbiology data

Description

microtest_done -> micro.test_performed (yesno)
microtest_date -> micro.test_date (date)
microday -> micro.test_days_from_admission (pos_integer)
micro_test -> micro.test_type (list)
micro_isolates -> micro.pathogen_detected (yesno_unknown)
isolate_identified -> micro.pathogen, .micro_isolate_list (checkboxes_to_nested_list)
pn_result -> micro.pneumo_serotype_status (list)
pn_st -> micro.pneumo_serotype (pneumo_serotype)
micro_lab -> micro.sent_to_central_lab (yesno_unknown)
pen_susceptibility -> micro.penicillin_susceptibility (checkboxes_to_list)
septrin_susceptibility -> micro.septrin_susceptibility (checkboxes_to_list)
doxy_susceptibility -> micro.doxycycline_susceptibility (checkboxes_to_list)
levoflox_suscept -> micro.levofloxacin_susceptibility (checkboxes_to_list)
cef_susceptibility -> micro.ceftriaxone_susceptibility (checkboxes_to_list)
pn_uat_result -> micro.pneumo_binax_now (list)
lg_uat_result -> micro.pneumo_legionella_uat (list)
micro_final_report -> micro.is_final_report (yesno)

Usage

map_avoncap_micro(instrument)
map_avoncap_micro(instrument)

Arguments

instrument

the numeric instrument number

Value

a list

Normalise the avoncap pneumococcal data

Description

participant_number -> admin.record_number (name)
hospital -> admin.hospital (list)
nhs_number -> admin.patient_identifier (ppi)
age_at_admission -> demog.age (double)
sex -> demog.gender (list)
test_date -> pneumo.test_date (date)
test -> pneumo.test_type (list)
serotype -> pneumo.phe_serotype (pneumo_serotype)
smoker -> demog.smoker (list)
resp_disease -> comorbid.no_resp_dx, comorbid.copd, comorbid.asthma, comorbid.bronchiectasis, comorbid.pulmonary_fibrosis, comorbid.resp_other (checkboxes)
chd -> comorbid.no_heart_dx, comorbid.ccf, comorbid.ihd, comorbid.hypertension, comorbid.af, comorbid.other_heart_dx (checkboxes)
mi -> comorbid.previous_mi (yesno)
ckd -> comorbid.ckd (list)
liver_disease -> comorbid.liver_disease (list)
diabetes -> comorbid.diabetes (list)
dm_meds -> comorbid.diabetes_medications (list)
dementia -> comorbid.no_dementia, comorbid.dementia, comorbid.cognitive_impairment (checkboxes)
neurological_disease -> comorbid.neuro_other, comorbid.cva, comorbid.tia, comorbid.hemiplegia, comorbid.paraplegia, comorbid.no_neuro_dx (checkboxes)
gastric_ulcers -> comorbid.gastric_ulcers (yesno)
dysphagia -> comorbid.dysphagia (yesno)
pvd -> comorbid.periph_vasc_dx (yesno)
ctd -> comorbid.connective_tissue_dx (yesno)
immunodeficiency -> comorbid.immunodeficiency (yesno)
other_pn_disease -> comorbid.other_pneumococcal_risks (yesno)
hiv -> comorbid.no_HIV, comorbid.HIV, comorbid.AIDS (checkboxes)
cancer -> comorbid.solid_cancer (list)
haem_malig -> comorbid.no_haemotological_cancer, comorbid.leukaemia, comorbid.lymphoma (checkboxes)
recent_chemo -> comorbid.recent_chemotherapy (yesno)
recent_radiotherapy -> comorbid.recent_radiotherapy (yesno)
transplant -> comorbid.transplant_recipient (yesno)
pregnancy -> comorbid.pregnancy (list)
drugs -> demog.no_drug_abuse, demog.alcohol_abuse, demog.ivdu_abuse, demog.marijuana_abuse, demog.other_inhaled_drug_abuse (checkboxes)
immsup -> admission.on_immunosuppression (yesno)
weight_problem -> comorbid.bmi_status (list)
concomittant_flu -> comorbid.influenza_infection (yesno)
hcv -> comorbid.hepatitis_c (yesno)
ppv23 -> vaccination.ppv23_vaccination (list)
flu_vaccine -> vaccination.flu (list)
cci_total_score -> admission.charlson_comorbidity_index (name)
los_days -> outcome.length_of_stay (double)
amts -> admission.triage_score (list)
resp_rate -> admission.respiratory_rate (double)
sats_ra -> admission.saturations_on_room_air (double)
systolic_bp -> admission.systolic_bp (double)
diastolic_bp -> admission.diastolic_bp (double)
crb65_score -> admission.crb_65_severity_score (list)
curb65_score -> admission.curb_65_severity_score (list)
antibiotic_route -> outcome.antibiotic_route (list)
antibiotic_days -> outcome.antibiotic_duration (double)
infection_site -> admission.infection_site (list)
deranged_lfts -> outcome.abnormal_lft (yesno)
aki -> outcome.acute_kidney_injury (yesno)
pleural_effusion -> outcome.pleural_effusion (yesno)
empyema -> outcome.empyema (yesno)
discharge_destination -> outcome.discharge_to (list)
icu -> outcome.admitted_icu (yesno)
niv -> outcome.non_invasive_ventilation (yesno)
intubation -> outcome.intubation (yesno)
recurrent_pneumonia -> outcome.recurrent_pneumonia (yesno)
ecmo -> outcome.received_ecmo (yesno)
inotropes -> outcome.received_ionotropes (yesno)
trachy -> outcome.tracheostomy (yesno)
inpatient_death -> outcome.inpatient_death (yesno)
death_30days -> outcome.death_within_30_days (yesno)
death_1year -> outcome.death_within_1_year (yesno)
survival_days -> outcome.survival_duration (name)
albumin -> haem.albumin (double)
wcc -> haem.white_cell_count (double)
hb -> haem.haemoglobin (double)
pmn -> haem.neutrophils (double)
lymphocytes -> haem.lymphocytes (double)
crp -> haem.crp (double)
na_result -> haem.sodium (double)
ur_result -> haem.urea (double)
egfr -> haem.egfr (double)
creatinine -> haem.creatinine (double)
cxr_sides -> radio.cxr_infection (list)
cxr_lobes -> radio.cxr_lobar_changes (list)
death_5year -> outcome.death_within_5_years (yesno)
survival_days_2 -> outcome.5_yr_survival_duration (name)
imd_decile -> demog.imd_decile (name)

Usage

map_avoncap_pneumococcal()
map_avoncap_pneumococcal()

Value

a list

Normalise the avoncap data radiology data

Description

radio_exam -> radio.test_performed (yesno)
radiology_date -> radio.test_date (date)
radiodays -> radio.test_days_from_admission (pos_integer)
radio_test -> radio.test_type (list)
radiology_result -> radio.alrtd_finding (checkboxes_to_nested_list)
radiology_other_result -> radio.non_alrtd_finding (checkboxes_to_nested_list)

Usage

map_avoncap_radio(instrument)
map_avoncap_radio(instrument)

Arguments

instrument

the numeric instrument number

Value

a list

Normalise the avoncap data virology data

Description

viral_testing_performed -> virol.test_performed (yesno)
virology_date_of_asst -> virol.test_date (date)
viroldays -> virol.test_days_from_admission (pos_integer)
specimen_type -> virol.test_type (list)
virus_isolated -> virol.pathogen_detected (yesno)
test_type -> virol.test_type (list)
virus_pathogen -> virol.pathogen, .virol_isolate_list (checkboxes_to_nested_list)
virol_patient_lab -> virol.test_provenance (list)

Usage

map_avoncap_virol(instrument)
map_avoncap_virol(instrument)

Arguments

instrument

the numeric instrument number

Value

a list

Normalise the urinary antigen data

Description

RESULT -> pneumo.urine_antigen_result, .x (text)
EVENT_DATE -> pneumo.test_date (date)
ANALYSIS -> pneumo.urine_antigen_test (name)
SUBJECT -> admin.consented_record_number (study_id)
BARCODE -> pneumo.urine_antigen_sample_id (name)

Usage

map_urine_antigens()
map_urine_antigens()

Value

a list

Normalise the urinary antigen data (binax results)

Description

RESULT -> pneumo.binax_result, .x (text)
EVENT_DATE -> pneumo.test_date (date)
SUBJECT -> admin.consented_record_number (study_id)
BARCODE -> pneumo.urine_antigen_sample_id (name)

RESULT -> pneumo.binax_result, .x (text)
EVENT_DATE -> pneumo.test_date (date)
SUBJECT -> admin.consented_record_number (study_id)
BARCODE -> pneumo.urine_antigen_sample_id (name)

Usage

map_urine_binax()

map_urine_binax()
map_urine_binax()

map_urine_binax()

Value

a list

find most recent files of a specific type

Description

find most recent files of a specific type

Usage

most_recent_files(
  type = "",
  subtype = NULL,
  reproduce_at = as.Date(getOption("reproduce.at", default = Sys.Date()))
)
most_recent_files(
  type = "",
  subtype = NULL,
  reproduce_at = as.Date(getOption("reproduce.at", default = Sys.Date()))
)

Arguments

`type`	see valid_inputs() for current list of supported types in input directory
`subtype`	see valid_inputs() for list of supported filenames
`reproduce_at`	after this date new files are ignored. This enforces a specific version of the data.

Value

a list of the file paths to the most up to date files of the given type relevant to each site and study year

Examples


# devtools::load_all()
try({
  avoncap::set_input("~/Data/avoncap")
  avoncap::input("nhs-extract")

  avoncap::all_files()


  # exact match on filename column of all_data()
  avoncap::most_recent_files("AvonCAPLRTDCentralDa")


  # or matches by lower case startWith on directory
  avoncap::most_recent_files("nhs-extract","deltave")


  avoncap::most_recent_files("metadata")

  avoncap::valid_inputs()
})
# devtools::load_all()
try({
  avoncap::set_input("~/Data/avoncap")
  avoncap::input("nhs-extract")

  avoncap::all_files()


  # exact match on filename column of all_data()
  avoncap::most_recent_files("AvonCAPLRTDCentralDa")


  # or matches by lower case startWith on directory
  avoncap::most_recent_files("nhs-extract","deltave")


  avoncap::most_recent_files("metadata")

  avoncap::valid_inputs()
})

Sanitise AvonCap data columns

Description

Usage

normalise_data(rawData, instrument = NULL, ...)
normalise_data(rawData, instrument = NULL, ...)

Arguments

rawData

the raw data from load_data()

instrument

the numeric instrument number if applicable

...

Named arguments passed on to normalise_generic

remove_mapped

gets rid of original columns for which we have a mapping (leaving the new versions)

remove_unmapped

gets rid of columns for which we do not have a mapping

mappings

a set of mappings (see zzz-avoncap-mappings.R)

messages

a set of dtrackr glue specs that populate the first box fo the flow chart. (can use {files}, {reproduce_at}, {date}, {.total})

data_source_info

if not null a filename, and the function will write out a file with the details of the input files used.

...

passed onto .cached(...). e.g. nocache = TRUE can be used to defeat caching.

Details

This function maps the data into a tidy dataframe with consistently named columns, and named factors where appropriate. The mapping is defined in data.

files Most of the sanitisation code is held in the normalise-xxx.R file. but these in turn may depend on the mapping-xxx.R files

Value

a tracked dataframe with n

Get the mapping of transformed columns back to original

Description

Get the mapping of transformed columns back to original

Usage

original_field_names(data, inverse = TRUE)
original_field_names(data, inverse = TRUE)

Arguments

`data`	the transformed data set.
`inverse`	give the data as a old -> new mapping for finding normalised names of original columns. if false gives it as new->old for finding original names of normalised columns

Value

a named list mapping original to new columns

Pneumococcal UAD serotypes

Description

A somewhat complete list of pneumococcal serotypes as seen in Bristol

Get a label for a column

Description

Get a label for a column

Usage

readable_label(columnVar, colNames = default_column_names())
readable_label(columnVar, colNames = default_column_names())

Arguments

`columnVar`	the column name as a string
`colNames`	bespoke column names mapping (see `default_column_names(...)`)

Value

a mapped column name

Get a readable label for the AvonCap data as a named list (for ggplot)

Description

Get a readable label for the AvonCap data as a named list (for ggplot)

Usage

readable_label_mapping(x, ...)

## S3 method for class 'data.frame'
readable_label_mapping(x, colNames = default_column_names(...), ...)

## S3 method for class 'list'
readable_label_mapping(x, colNames = default_column_names(...), ...)

## S3 method for class 'character'
readable_label_mapping(x, colNames = default_column_names(...), ...)

## Default S3 method:
readable_label_mapping(x, colNames = default_column_names(...), ...)
readable_label_mapping(x, ...)

## S3 method for class 'data.frame'
readable_label_mapping(x, colNames = default_column_names(...), ...)

## S3 method for class 'list'
readable_label_mapping(x, colNames = default_column_names(...), ...)

## S3 method for class 'character'
readable_label_mapping(x, colNames = default_column_names(...), ...)

## Default S3 method:
readable_label_mapping(x, colNames = default_column_names(...), ...)

Arguments

`x`	either the column names as strings, or a dataframe
`...`	ignored
`colNames`	a mapping to convert a column name (as a string) to a readable label

Value

a named list of the labels for the columns

Methods (by class)

readable_label_mapping(data.frame): for data frames
readable_label_mapping(list): for lists
readable_label_mapping(character): for character vectors
readable_label_mapping(default): defaults

Relevel serotype data into an factor based on PCV group status and serotype name.

Description

Relevel serotype data into an factor based on PCV group status and serotype name.

Usage

relevel_serotypes(serotypes, ..., exprs)
relevel_serotypes(serotypes, ..., exprs)

Arguments

`serotypes`	a vector of serotypes as a factor or character.
`...`	an unwrapped version of the `exprs` parameter
`exprs`	a list of formulae with a predicate on the LHS and a PCV group name on the RHS. which are interpreted as the parameters for a `dplyr::case_when` call. This must be protected against interpretation by wrapping it in `rlang::exprs()`. The predicates are tested against `avoncap::serotype_data$map` and could use any of the following columns 'serotype', 'PCV7', 'PCV13', 'PCV15', 'PCV20', 'PPV23', 'PCV13on7', 'PCV15on13', 'PCV20on15', 'PPV23on20', 'PCV10SSI', 'PCV10GSK', 'PCV15Zhifei', 'PCV24Vaxcyte', 'PCV24Affinivax' a default option of the form `TRUE ~ "Non PCV serotype"` must exist to capture unmatched items.

Examples

x = rlang::exprs(
  PCV7 ~ "PCV7",
  PCV15 ~ "PCV15-7",
  TRUE ~ "Non-PCV15 serotype"
)
relevel_serotypes(avoncap::phe_serotypes, exprs=x)
relevel_serotypes(avoncap::phe_serotypes)

relevel_serotypes(avoncap::phe_serotypes,
  PCV24Affinivax ~ "Affinivax",
  TRUE ~ "Non-affinivax"
)
x = rlang::exprs(
  PCV7 ~ "PCV7",
  PCV15 ~ "PCV15-7",
  TRUE ~ "Non-PCV15 serotype"
)
relevel_serotypes(avoncap::phe_serotypes, exprs=x)
relevel_serotypes(avoncap::phe_serotypes)

relevel_serotypes(avoncap::phe_serotypes,
  PCV24Affinivax ~ "Affinivax",
  TRUE ~ "Non-affinivax"
)

Write file source information out to a text files

Description

Write file source information out to a text files

Usage

save_data_source_info(..., .file)
save_data_source_info(..., .file)

Arguments

`...`	A list of data frames loaded with the `load_data(...)` call
`.file`	the output file location

Value

the file name written (invisibly)

A ggplot scale for pneumococcal serotypes that keeps PCV groups together

Description

The scale groups colours by PCV group, but it is important to have the source data using the same levels as this scale otherwise the colour legend will be ordered in a different sequence. This can be achieved using relevel_serotypes,

Usage

scale_fill_serotype(
  ...,
  palette_fn = scales::brewer_pal(palette = "Dark2"),
  undefined = "#606060",
  exprs = rlang::exprs()
)
scale_fill_serotype(
  ...,
  palette_fn = scales::brewer_pal(palette = "Dark2"),
  undefined = "#606060",
  exprs = rlang::exprs()
)

Arguments

`...`	an unwrapped version of the `exprs` parameter Named arguments passed on to `ggplot2::scale_fill_manual` `values` a set of aesthetic values to map data values to. The values will be matched in order (usually alphabetical) with the limits of the scale, or with `breaks` if provided. If this is a named vector, then the values will be matched based on the names instead. Data values that don't match will be given `na.value`. `aesthetics` Character string or vector of character strings listing the name(s) of the aesthetic(s) that this scale works with. This can be useful, for example, to apply colour settings to the `colour` and `fill` aesthetics at the same time, via `aesthetics = c("colour", "fill")`. `breaks` One of: `NULL` for no breaks `waiver()` for the default breaks (the scale limits) A character vector of breaks A function that takes the limits as input and returns breaks as output `na.value` The aesthetic value to use for missing (`NA`) values
`palette_fn`	a function that returns a set of colours for a number of levels. Such functions can be obtained from things like `scales::brewer_pal(...)`
`undefined`	the colour for the last group which is assumed to be the `Unknown` types
`exprs`	a list of formulae with a predicate on the LHS and a PCV group name on the RHS. which are interpreted as the parameters for a `dplyr::case_when` call. This must be protected against interpretation by wrapping it in `rlang::exprs()`. The predicates are tested against `avoncap::serotype_data$map` and could use any of the following columns 'serotype', 'PCV7', 'PCV13', 'PCV15', 'PCV20', 'PPV23', 'PCV13on7', 'PCV15on13', 'PCV20on15', 'PPV23on20', 'PCV10SSI', 'PCV10GSK', 'PCV15Zhifei', 'PCV24Vaxcyte', 'PCV24Affinivax' a default option of the form `TRUE ~ "Non PCV serotype"` must exist to capture unmatched items.

Value

A ggplot2 scale

Pneumococcal UAD serotype groups and crossmaps

Description

A list of pneumococcal serotype / UAD cross mappings

Pneumococcal serotype PCV groups

Description

Pneumococcal serotype PCV groups

Serotype UAD mappings

Description

Serotype UAD mappings

Sets the location of data for an analysis

Description

Also performs some structure checks and makes sure that the README files are in place.

Usage

set_input(path)
set_input(path)

Arguments

path

the path to the input directory

Value

the full path to the directory

Spline term marginal effects plot

Description

Spline term marginal effects plot

Usage

spline_term_plot(
  coxmodel,
  var_name,
  xlab = var_name,
  max_y = NULL,
  n_breaks = 7
)
spline_term_plot(
  coxmodel,
  var_name,
  xlab = var_name,
  max_y = NULL,
  n_breaks = 7
)

Arguments

`coxmodel`	an output of a coxph model
`var_name`	a variable that is involved in a spline term
`xlab`	x axis label
`max_y`	maximium hazard ratio to display on y axis. Inferred from the central estimates if missing, which will most likely cut off confidence intervals
`n_breaks`	The number of divisions on the y axis

Value

a ggplot

Stacked bar plot

Description

This function plots a stacked bar of proportions for an input set of data

Usage

stacked_barplot(data, mapping, ...)
stacked_barplot(data, mapping, ...)

Arguments

`data`	the data
`mapping`	a aes mapping with at least `x` and `fill`. If facetting then `group` must contain the facet variable
`...`	passed to `geom_bar`

Value

a ggplot

Examples

stacked_barplot(
    ggplot2::diamonds,
    ggplot2::aes(x=cut, fill=clarity, group=color)
  )+
  ggplot2::facet_wrap(dplyr::vars(color))

stacked_barplot(
    ggplot2::diamonds,
    ggplot2::aes(x=cut, fill=clarity, group=color)
  )+
  ggplot2::facet_wrap(dplyr::vars(color))

Convert a study week back into a date

Description

This is poorly named as only give the start date is the input is an integer

Usage

start_date_of_week(study_week)
start_date_of_week(study_week)

Arguments

study_week

does accept decimals and returns the nearest whole date to the value

Value

a vector of study_week numbers starting at zero for the first week of the study

Convert a study week back into a date

Description

This is poorly named as only give the start date is the input is an integer

Usage

start_date_of_week_legacy(study_week)
start_date_of_week_legacy(study_week)

Arguments

study_week

does accept decimals and returns the nearest whole date to the value

Convert a date to a study week

Description

Convert a date to a study week

Usage

study_week(dates)
study_week(dates)

Arguments

dates

a list of date objects

Value

an integer number of completed weeks since the start of the study.

Convert a date to a study week

Description

Convert a date to a study week

Usage

study_week_legacy(dates)
study_week_legacy(dates)

Arguments

dates

a list of date objects

Value

an integer number of weeks since 2019-12-30

UAD serotype groups

Description

UAD serotype groups

UAD PCV map

Description

UAD PCV map

Upset plot with counts stratified by a categorical column

Description

Upset plot with counts stratified by a categorical column

Usage

upset_plot(df, boolean_cols, categorical_col, lbl_size = 5)
upset_plot(df, boolean_cols, categorical_col, lbl_size = 5)

Arguments

`df`	the data
`boolean_cols`	a tidyselect specification selecting the columns to be used as binary one-hot encoded classes
`categorical_col`	a column containing a disjoint category as a factor
`lbl_size`	font sise of the label

Value

a ggplot

A valid set of types of file that can be loaded by `load_data(...)`

Description

A valid set of types of file that can be loaded by load_data(...)

Usage

valid_inputs()
valid_inputs()

Value

a dataframe of type, subtype

Examples


# devtools::load_all()
try({
  avoncap::set_input("~/Data/avoncap")
  avoncap::input("nhs-extract")

  avoncap::all_files()


  # exact match on filename column of all_data()
  avoncap::most_recent_files("AvonCAPLRTDCentralDa")


  # or matches by lower case startWith on directory
  avoncap::most_recent_files("nhs-extract","deltave")


  avoncap::most_recent_files("metadata")

  avoncap::valid_inputs()
})
# devtools::load_all()
try({
  avoncap::set_input("~/Data/avoncap")
  avoncap::input("nhs-extract")

  avoncap::all_files()


  # exact match on filename column of all_data()
  avoncap::most_recent_files("AvonCAPLRTDCentralDa")


  # or matches by lower case startWith on directory
  avoncap::most_recent_files("nhs-extract","deltave")


  avoncap::most_recent_files("metadata")

  avoncap::valid_inputs()
})

Validate AvonCap raw data

Description

Runs a set of QA checks. This function dispatches the call in a data set specific function using the type and subtype of the data set. The checks are in source files named validate-xxx.R depending on the data source.

Usage

validate_data(rawData, ...)
validate_data(rawData, ...)

Arguments

`rawData`	the raw data from `load_data()`
`...`	not used / passed to the validation function specific to the type of data.

Value

the same input with a new data_quality_failures attribute containing issues.

Write out data quality issues

Description

Write out data quality issues

Usage

write_issues(df, file)
write_issues(df, file)

Arguments

`df`	the raw data frame
`file`	the output data quality file

Value

the list of failures as a dataframe

Wrapper around `table`

Description

Wrapper around table

Usage

xglimpse(data, ...)
xglimpse(data, ...)

Arguments

`data`	a dataframe
`...`	columns or named expressions to cross-tabulate

Value

the cross-tabulation

Year and week number lookup table

Description

Inference of admission date from year and week number. This is far from easy as the year and week_number data is noisy and lacking in consistency. In general we don't use this and rely instead on the admission_date but this is not always available.

Usage

data(year_week_number_lookup)
data(year_week_number_lookup)

Format

A dataframe containing the following columns:

year - The given year. This can be inferred from database year and week number
week_number - The given week_number. This starts numbering from 31 up to 53 and then resets to 1 for any given year
start_of_week - the start date of the week of the study. This is used as a proxy for the admission date if it is unknown.
study_week - the number of complete weeks singe the start of the study

213 rows and 5 columns

Package 'avoncap'

Help Index

Clear data from the passthrough cache for complex or long running operations

Description

Usage

Arguments

Value

Delete stale files in a cache

Description

Usage

Arguments

Value

Download a file into a local cache.

Description

Usage

Arguments

Value

A simple pass-through cache for complex or long running operations

Description

Usage

Arguments

Value

Scans the input directory and returns csv or xlsx files in that directory

Description

Usage

Value

Sanitise AvonCap data columns

Description

Usage

Arguments

Details

Value

Applies a set of functions to the whole dataframe

Description

Usage

Arguments

Value

Examples

Dodged bar and whiskers proportions

Description

Usage

Arguments

Value

Cut and label an integer valued quantity

Description

Usage

Arguments

Value

Examples

default column naming mappings

Description

Usage

Arguments

Value

The avoncap denominator dataset

Description

Usage

Format

Create a counter in the event of repeated admissions

Description

Usage

Arguments

Value

The aLRTD incidence paper classifications

Description

Usage

Arguments

Details

Value

Create a flag for patients who have been given antivirals

Description

Usage

Arguments

Value

Identify patients who are in the BNSSG ICB based on their GP practice name

Description

Usage

Arguments

Value

Derive detailed vaccination status on admission