--- title: "Single test error adjustment" output: html_document vignette: > %\VignetteIndexEntry{Single test error adjustment} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.dim = c(6, 4) ) ``` ```{r setup} library(tidyverse) library(testerror) options(mc.cores = 2) # parallel::detectCores()) options(testerror.debug=FALSE) rstan::rstan_options(auto_write = FALSE) here::i_am("vignettes/testerror.Rmd") source(here::here("vignettes/formatting.R")) ``` Supposing we have a disease with a true prevalence in a population of 10%. We test for its presence using a test with sensitivity 0.8 and specificity 0.95. How many positive test results do we expect to see in a sample of 1000 patients? ```{r} ap = apparent_prevalence(p = 0.1, sens = 0.8, spec = 0.95) tmp = tibble::tibble(x = stats::rbinom(1000, 1000, ap)) ggplot(tmp, ggplot2::aes(x=x)) + ggplot2::geom_histogram(binwidth = 1) + ggplot2::geom_vline(xintercept=100, colour="red")+ ggplot2::coord_cartesian(xlim = c(25,175)) ``` given this setup the test positivity rate will usually be an overestimate of true prevalence. If instead the specificity is 0.995 the picture is completely different, and the expected number of positives will usually be an underestimate: ```{r} ap = apparent_prevalence(p = 0.1, sens = 0.8, spec = 0.995) tmp = tibble::tibble(x = stats::rbinom(1000, 1000, ap)) ggplot(tmp, ggplot2::aes(x=x)) + ggplot2::geom_histogram(binwidth = 1) + ggplot2::geom_vline(xintercept=100, colour="red")+ ggplot2::coord_cartesian(xlim = c(25,175)) ``` A related question is if sensitivity of the test is 0.8 what specificity is required at 10% prevalence for apparent prevalence to be an unbiased estimate? I.e. how specific does a test need to be? ```{r} optimal_performance(p=0.1, sens=0.8) ``` Any specificity lower than this will result in apparent prevalence being an overestimate due to false positives outweighing false negatives, and any specificity higher than this will result in apparent prevalence being an underestimate, due to false negatives outweighing false positives. # Rogan-gladen adjustment We can use the rogan-gladen adjustment to estimate the true prevalence if we know sensitivity and specificity. ```{r} rogan_gladen(0.125, 0.8, 0.95) ``` However this suffers from a truncation issue if the apparent prevalence is outside of the range of `(1-spec)` to `sens`: ```{r} rogan_gladen(0.05, 0.8, 0.95) rogan_gladen(0.8, 0.8, 0.95) ``` It also requires an exact measure of sensitivity and specificity. In reality this is very hard to obtain. In the associated paper we discuss 3 methods for accounting for uncertainty in sensitivity and specificity in estimating true prevalence from apparent prevalence. These are the lang-reiczigel method, a resampling procedure combined with rogan gladen estimates and a full bayesian model. To illustrate these we look a the example of BinaxNOW testing. # BinaxNOW Example In this case we have uncertain estimates of sensitivity and specificity from the literature, supplied as median values with 95% confidence intervals. We assume this has an underlying Beta distribution to estimate uncertainty. ```{r} # Fit beta distributions to sinclair 2013 data: sens = beta_params(median = 0.740, lower = 0.666, upper = 0.823) spec = beta_params(median = 0.972, lower= 0.927, upper = 0.998) sens spec ``` For demonstration purposes we look at the true prevalence for every value of apparent prevalence between 0 and 150 positives in an set of 1000 tests, using the 3 methods: ```{r} tmp1 = tibble::tibble( observed = 0:150, n = 1000 ) %>% dplyr::mutate(testerror::true_prevalence( pos_obs = observed, n_obs = n, sens = sens, spec = spec, method = "bayes", model_type = "logit" )) tmp2 = tibble::tibble( observed = 0:150, n = 1000 ) %>% dplyr::mutate(testerror::true_prevalence( pos_obs = observed, n_obs = n, sens = sens, spec = spec, method = "lang" )) tmp3 = tibble::tibble( observed = 0:150, n = 1000 ) %>% dplyr::mutate(testerror::true_prevalence( pos_obs = observed, n_obs = n, sens = sens, spec = spec, method = "rogan" )) tmp = dplyr::bind_rows(tmp1,tmp2,tmp3) ``` Comparing the 3 methods for a wide range of apparent prevalence qualitatively shows the different behaviour of the methods. The Bayesian model seems more appropriate when prevalence is low as it does not suffer from the same truncation issue as the other methods. It is however much slower. ```{r} thresh = testerror::underestimation_threshold(sens$q(0.5),spec$q(0.5)) N = unique(tmp$n) ggplot(tmp, ggplot2::aes(y=observed/n, x=prevalence.median, xmin=prevalence.lower, xmax=prevalence.upper)) + ggplot2::geom_point()+ ggplot2::geom_errorbarh(alpha=0.4)+ ggplot2::geom_point(data=tmp %>% dplyr::filter(observed==40), colour="orange")+ ggplot2::geom_errorbarh(data=tmp %>% dplyr::filter(observed==40), colour="orange", height=0)+ ggplot2::geom_abline(colour="blue")+ ggplot2::geom_point(x=thresh,y=thresh,colour="magenta", size=4, shape=10)+ ggplot2::geom_abline(intercept = 1-spec$q(0.5), slope = sens$q(0.5)-(1-spec$q(0.5)), colour="red")+ ggplot2::scale_y_continuous(name="observed apparent prevalence",sec.axis = ggplot2::sec_axis(trans = ~ .x*N,name = sprintf("count (N=%d)",N)))+ ggplot2::labs(caption= sprintf("threshold value where true prevalence equals apparent prevalence: (%1.2f,%1.2f)\nspecificity: %s\nsensitivity: %s",thresh,thresh,format(spec),format(sens)))+ ggplot2::xlab("estimated true prevalence")+ ggplot2::facet_wrap(~prevalence.method) ``` In the figure above the rogan-gladen adjustment for the median sensitivity and specificity is the red line (base), the blue line shows the case assuming the test is 100% sensitive and specific. and the magenta cross the point at which true prevalence is equal to apparent prevalence. # Test uncertainty In the example above we have no gold standard to compare output to. For testing we can generate a simulation using the sensitivity and specificity and using a range of simulated true prevalence. Applying correction to simulated apparent prevalence allows us to compare to a simulated gold standard. In this example the distribution of sensitivity and specificity used to estimate the true prevalence is the same as that used to simulate the "observed" data. ```{r} ex1 = testerror:::test_example( spec = spec$shape1/spec$conc, sens = sens$shape1/sens$conc, n_samples = 5000, n_controls = floor(spec$conc), n_diseased = floor(sens$conc), exact=TRUE ) res1 = ex1$summary %>% dplyr::mutate(true_prevalence( pos_obs = observed_pos, n_obs = n_samples, false_pos_controls = ex1$performance$false_pos_controls, n_controls = ex1$performance$n_controls, false_neg_diseased = ex1$performanc$false_neg_diseased, n_diseased = ex1$performance$n_diseased, method = "bayes", model_type = "logit" )) res2 = ex1$summary %>% dplyr::mutate(true_prevalence( pos_obs = observed_pos, n_obs = n_samples, false_pos_controls = ex1$performance$false_pos_controls, n_controls = ex1$performance$n_controls, false_neg_diseased = ex1$performanc$false_neg_diseased, n_diseased = ex1$performance$n_diseased, method = "lang-reiczigel" )) res3 = ex1$summary %>% dplyr::mutate(true_prevalence( pos_obs = observed_pos, n_obs = n_samples, false_pos_controls = ex1$performance$false_pos_controls, n_controls = ex1$performance$n_controls, false_neg_diseased = ex1$performanc$false_neg_diseased, n_diseased = ex1$performance$n_diseased, method = "rogan-gladen" )) testerror:::demo_bar_plot(dplyr::bind_rows(res1,res2,res3),.pcv = FALSE)+ ggplot2::labs(caption=sprintf("Red lines / grey bars are apparent prevalence / counts Blue lines the underlying true prevalence rate. Test sensitivity: %s Test specificity: %s ", format(sens), format(spec))) ``` In the range of "true" prevalences tested here (0-20%) apparent prevalence ranges from about 0.25 to 17.5%. Adjusting for test uncertainty in the 3 methods described here gives a point estimate near the true prevalence (blue line) in most cases. Confidence intervals are a result of uncertainty in test performance metrics and not by prevalence of disease. Not shown is the fact that the width of the confidence of the correction is not improved by increasing the sample size (although the accuracy of the central prediction is). Improving confidence in our predictions of true prevalence can be achieved through better estimates of test performance metrics.