| Title: | Download Infectious Disease Data from 'SurvStat' (Robert Koch Institute) |
|---|---|
| Description: | Provides an interface to the 'SurvStat' web service from the Robert Koch Institute (<https://tools.rki.de/SurvStat/SurvStatWebService.svc>) allowing downloads of disease time series stratified by pathogen type and subtype, age, and geography from notifiable disease reports in Germany. |
| Authors: | Robert Challen [aut, cre] (ORCID: <https://orcid.org/0000-0002-5504-7768>), Bristol Vaccine Centre [fnd, cph] |
| Maintainer: | Robert Challen <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.4 |
| Built: | 2026-05-21 08:42:09 UTC |
| Source: | https://github.com/bristol-vaccine-centre/rsurvstat |
SurvStat age group listsingle_year
children_coarse: from 0, 15, 20, 25, 30, 40, 50, 60, 70, 80 years
children_medium: from 0, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80 years
children_fine: from 0, 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80 years
five_year: from 0, 1, 5, 10, 15, 20, … , 75, 80 years
zero_fifteen: from 0, 15+ years
zero_fifteen_sixty: from 0, 15, 60+ years
zero_one_4_20_40_60_80: from 0, 4, 20, 40, 60, 80+ years
age_groupsage_groups
An object of class list of length 8.
https://survstat.rki.de/Content/Query/Create.aspx
sf mapA Berlin outline sf map
data(BerlinMap)data(BerlinMap)
A sf dataframe containing the following columns:
Name (character) - the Name column
1 rows
SurvStat requestsThis function is only intended to be used interactively. The cache can be
controlled with set_cache_settings()
cache_clear(confirm = utils::askYesNo("Are you sure?"))cache_clear(confirm = utils::askYesNo("Are you sure?"))
confirm |
can be set to TRUE to make function non interactive. |
nothing. called for side effects
cache_clear( confirm = interactive() )cache_clear( confirm = interactive() )
CountyKey71Map datasetThis matches the CountyKey71 dimension in SurvStat. This is the 400
Stadtkreis and Landkreise administrative regions in Germany, plus 12
Berlin boroughs (Bezirke) which replace the Berlin Kriese (Id: 11000).
The boroughs have sequential Ids from [11001] to [11012]
data(CountyKey71Map)data(CountyKey71Map)
A sf dataframe containing the following columns:
Id - the full SurvStat identifier for this region (includes
hierarchical information)
ComponentId - the id of the most granular geographical unit (which can be
used to link out to other data sets)
HierarchyId - the id of the geographical unit type
Name - the name of the region
Any grouping allowed.
411 rows
SurvStat disease listSupported diseases:
Acinetobacter (key: Acinetobacter-Infektion oder –Kolonisation)
Adenovirus (key: Adenovirus (andere Form, Meldepflichtig gemäß Landesmeldeverordnung))
Amoebiasis (key: Amoebiasis)
Anthrax (key: Milzbrand)
Arbovirus (key: Arbovirus-Erkrankung)
Astrovirus (key: Astrovirus-Infektion)
Bornavirus (key: Bornavirus)
Botulism (key: Botulismus)
Brucellosis (key: Brucellose)
CJD (key: CJK)
CJD, variant (key: vCJK)
COVID-19 (key: COVID-19)
Campylobacter (key: Campylobacter-Enteritis)
Candida auris (invasive) (key: Candida auris, invasive Infektion)
Chickenpox (key: Windpocken)
Chickenpox (state) (key: Windpocken (Meldepflicht gemäß Landesmeldeverordnung))
Chickungunya (key: Chikungunya-Fieber)
Chlamydia Trachomatis (key: Chlamydia-trachomatis-Infektion)
Cholera (key: Cholera)
Clostridium difficile / mild (key: Clostridium difficile, nicht schwerer Verlauf)
Clostridium difficile / moderate (key: Clostridium difficile, schwerer Verlauf)
Cryptosporidiosis (key: Kryptosporidiose)
Cytomegalovirus (key: Cytomegalie)
Dengue (key: Denguefieber)
Diptheria (key: Diphtherie)
E. Coli, enteritis (key: E.-coli-Enteritis)
E. Coli, enterohemorrhagic (key: EHEC-Erkrankung)
Ebola (key: Ebolafieber)
Echinococcosis (key: Echinokokkose)
Enterobacteria colonisation (key: Enterobacteriaceae-Infektion oder –Kolonisation)
Enterovirus (key: Enterovirus)
Gas gangrene (key: Gasbrand)
Gastroenteritis (other) (key: Weitere bedrohliche Krankheit (gastro))
Giardia (key: Giardiasis)
Gonorrhoea (key: Gonorrhoe)
Group B Streptococcus (key: Gruppe-B-Streptokokken)
HIV (key: HIV-Infektion)
Haemolytic-uraemic syndrome (key: HUS (Hämolytisch-urämisches Syndrom), enteropathisch)
Haemophilus influenza, invasive (key: Haemophilus influenzae, invasive Erkrankung)
Hand foot mouth disease (key: Hand-Fuß-Mund-Krankheit)
Hantavirus (key: Hantavirus-Erkrankung)
Head lice (key: Kopflausbefall)
Hepatitis (general) (key: Hepatitis (allgemein))
Hepatitis A (key: Hepatitis A)
Hepatitis B (key: Hepatitis B)
Hepatitis C (key: Hepatitis C)
Hepatitis D (key: Hepatitis D)
Hepatitis E (key: Hepatitis E)
Hepatitis non A-E (key: Hepatitis Non A-E)
Herpes Zoster (key: Herpes Zoster)
Influenza, seasonal (key: Influenza, saisonal)
Influenza, zoonotic (key: Influenza, zoonotisch)
Keratoconjunctivitis (IfSG) (key: Keratokunjunktivitis (Meldepflicht gemäß IfSG))
Keratoconjunctivitis (state) (key: Keratokunjunktivitis (Meldepflicht gemäß Landesmeldeverordnung))
Lassa fever (key: Lassafieber)
Legionalla (key: Legionellose)
Leprousy (key: Lepra)
Leptospirosis (key: Leptospirose)
Listeriosis (key: Listeriose)
Lyme Disease (key: Borreliose)
MERS (key: Middle East Respiratory Syndrome)
MRSA, invasive (key: MRSA, invasive Infektion)
Malaria (IfSG) (key: Malaria (§7(3) IfSG))
Malaria (state) (key: Malaria, Länderverordnung)
Marburg virus (key: Marburgfieber)
Measles (key: Masern)
Meningitis (other) (key: Meningitis, andere)
Meningococcal, invasive (key: Meningokokken, invasive Erkrankung)
Mpox (key: Affenpocken)
Mpox (key: Affenpocken)
Mumps (IfSG) (key: Mumps (Meldepflicht gemäß IfSG))
Mumps (state) (key: Mumps (Meldepflicht gemäß Landesmeldeverordnung))
Mycoplasma (key: Mycoplasma)
Norovirus (key: Norovirus-Gastroenteritis)
Orthinovirus (key: Ornithose)
Orthopox (key: Orthopocken)
Parainfluenze (key: Parainfluenza)
Paratyphus (key: Paratyphus)
Plague (key: Pest)
Pneumococcus (IfSG) (key: Pneumokokken (Meldepflicht gemäß IfSG))
Pneumococcus (state) (key: Pneumokokken (Meldepflicht gemäß Landesverordnung))
Poliomyelitis (key: Poliomyelitis)
Q-fever (key: Q-Fieber)
RSV (IfSG) (key: RSV (Meldepflicht gemäß IfSG))
RSV (state) (key: RSV (Meldepflicht gemäß Landesmeldeverordnung))
Rabies (confirmed) (key: Tollwut)
Rabies (suspected) (key: Tollwutexpositionsverdacht)
Relapsing fever (key: Läuserückfallfieber)
Ringworm (key: Ringelröteln)
Rotavirus gastroenteritis (key: Rotavirus-Gastroenteritis)
Rubella (key: Röteln, postnatal)
Rubella (state) (key: Röteln (Meldepflicht gemäß Landesmeldeverordnung))
Rubella, congenital (key: Röteln, konnatal)
SARS (key: SARS)
Salmonellosis (key: Salmonellose)
Scabies (key: Krätzmilbenbefall)
Scarlet fever (key: Scharlach)
Sepsis (other) (key: Weitere bedrohliche Krankheit)
Shigellosis (key: Shigellose)
Smallpox (key: Pocken)
Subacute Sclerosing Panencephalitis (key: Subakute Sklerosierende Panenzephalitis)
Syphilis (key: Syphilis)
Tetanus (key: Tetanus)
Tick bourne encephalitis (key: FSME (Frühsommer-Meningoenzephalitis))
Toxoplasmosis (key: Toxoplasmose)
Toxoplasmosis, congenital (key: Toxoplasmose, konnatal)
Trichinellosis (key: Trichinellose)
Tuberculosis (key: Tuberkulose)
Tulareamia (key: Tularämie)
Typhoid (key: Fleckfieber)
Typhoid, abdominal (key: Typhus abdominalis)
Typhus/Paratyphus (key: Typhus/Paratyphus)
Varicella, congenital (key: Fetales (kongenitales) Varizellensyndrom)
Vibria (key: Vibrionen)
Viral haemmorhagic fever (key: Virale hämorrhagische Fieber)
West Nile Virus (key: West-Nil-Virus)
Whooping cough (IfSG) (key: Keuchhusten (Meldepflicht gemäß IfSG))
Whooping cough (state) (key: Keuchhusten (Meldepflicht gemäß Landesmeldeverordnung))
Yellow fever (key: Gelbfieber)
Yersinia (key: Yersiniose)
Zika (key: Zikavirus-Erkrankung)
diseasesdiseases
An object of class list of length 121.
https://survstat.rki.de/Content/Query/Create.aspx
FedStateKey71Map dataset.This matches the FedStateKey71 dimension in SurvStat. This is the 16
federal states in Germany.
data(FedStateKey71Map)data(FedStateKey71Map)
A sf dataframe containing the following columns:
Id - the full SurvStat identifier for this region (includes
hierarchical information)
ComponentId - the id of the most granular geographical unit (which can be
used to link out to other data sets)
HierarchyId - the id of the geographical unit type
Name - the name of the region
16 rows
SurvStat outputSurvStat can be queried for count or incidence. From the combination of
these metrics queried across the whole range of disease notifications for any
given year we can infer a stratified population size, that SurvStat is using
to calculate it's incidence. This is simply modelled with a local polynomial
over time to allow us to fill in weekly population denominators.
fit_population(count_df, .progress = TRUE) infer_population( age_group = NULL, geography = NULL, years = NULL, .progress = TRUE )fit_population(count_df, .progress = TRUE) infer_population( age_group = NULL, geography = NULL, years = NULL, .progress = TRUE )
count_df |
a dataframe from the output of |
.progress |
by default a progress bar is shown, which may be important
if many downloads are needed to fulfil the request. It can be disabled
by setting this to |
age_group |
(optional) the age group of interest as a |
geography |
(optional) one of |
years |
(optional) a vector of years to limit the response to. This may
be useful to limit the size of returned pages in the event the |
the count_df dataframe with an additional population column
a dataframe with geography, age grouping, year and population columns
infer_population(): Query SurvStat for data to impute a population denominator
# snapshot: get_snapshot( disease = diseases$`COVID-19`, geography = "state", season=2024 ) %>% fit_population() %>% dplyr::glimpse() # timeseries # A weekly population estimate is inferred from the yearly data: get_timeseries( diseases$`COVID-19`, measure = "Count", age_group = age_groups$children_coarse ) %>% fit_population() %>% dplyr::glimpse() infer_population(years=2020:2025) %>% dplyr::glimpse()# snapshot: get_snapshot( disease = diseases$`COVID-19`, geography = "state", season=2024 ) %>% fit_population() %>% dplyr::glimpse() # timeseries # A weekly population estimate is inferred from the yearly data: get_timeseries( diseases$`COVID-19`, measure = "Count", age_group = age_groups$children_coarse ) %>% fit_population() %>% dplyr::glimpse() infer_population(years=2020:2025) %>% dplyr::glimpse()
SurvStat web service relating to a single time period.This function gets a snapshot of disease count or incidence data
from the Robert Koch Institute SurvStat web service, based on either whole
epidemiological season or an individual week within a season. Seasons are
whole years starting either at the beginning of the calendar year, at week 27
or at week 40.
get_snapshot( disease = NULL, measure = c("Count", "Incidence"), ..., season, season_week = NULL, season_start = 1, age_group = NULL, age_range = c(0, Inf), disease_subtype = FALSE, geography = NULL, .progress = TRUE )get_snapshot( disease = NULL, measure = c("Count", "Incidence"), ..., season, season_week = NULL, season_start = 1, age_group = NULL, age_range = c(0, Inf), disease_subtype = FALSE, geography = NULL, .progress = TRUE )
disease |
the disease of interest as a |
measure |
one of |
... |
not used, must be empty. |
season |
the start year of the season in which the snapshot is taken |
season_week |
the start week within the season of the snapshot. If missing then the whole season is used |
season_start |
the week of the calendar year in which the season starts
this can be one of |
age_group |
(optional) the age group of interest as a |
age_range |
(optional) a length 2 vector with the minimum and maximum ages to consider |
disease_subtype |
if |
geography |
(optional) a geographical breakdown. This can be given as a
character where it must be one of |
.progress |
by default a progress bar is shown, which may be important
if many downloads are needed to fulfil the request. It can be disabled
by setting this to |
The snapshot can be stratified by any combination of age, geography, disease,
disease subtype. Queries to SurvStat are cached and paged, but obviously
multidimensional extracts have the potential to need a lot of downloading.
a data frame with at least year (the start of the epidemiological
season) and start_week (the calendar week in which the epidemiological
season starts), and one of count or incidence columns. Most likely it
will also have disease_name and disease_code columns, and some of
age_name, age_code, age_low, age_high, geo_code, geo_name,
disease_subtype_code, disease_subtype_name depending on options.
get_snapshot( diseases$`COVID-19`, measure = "Count", season = 2024, age_group = age_groups$children_coarse ) get_snapshot( diseases$`COVID-19`, measure = "Count", age_group = age_groups$children_coarse, season = 2024, geography = rsurvstat::FedStateKey71Map[1:10,] )get_snapshot( diseases$`COVID-19`, measure = "Count", season = 2024, age_group = age_groups$children_coarse ) get_snapshot( diseases$`COVID-19`, measure = "Count", age_group = age_groups$children_coarse, season = 2024, geography = rsurvstat::FedStateKey71Map[1:10,] )
SurvStat web service.This function gets a weekly timeseries of disease count or incidence data
from the Robert Koch Institute SurvStat web service. The timeseries can be
stratified by any combination of age, geography, disease, disease subtype.
Queries to SurvStat are cached and paged, but obviously multidimensional
extracts have the potential to need a lot of downloading.
get_timeseries( disease = NULL, measure = c("Count", "Incidence"), ..., age_group = NULL, age_range = c(0, Inf), disease_subtype = FALSE, years = NULL, geography = NULL, trim_zeros = c("leading", "both", "none"), .progress = TRUE )get_timeseries( disease = NULL, measure = c("Count", "Incidence"), ..., age_group = NULL, age_range = c(0, Inf), disease_subtype = FALSE, years = NULL, geography = NULL, trim_zeros = c("leading", "both", "none"), .progress = TRUE )
disease |
the disease of interest as a |
measure |
one of |
... |
not used, must be empty. |
age_group |
(optional) the age group of interest as a |
age_range |
(optional) a length 2 vector with the minimum and maximum ages to consider |
disease_subtype |
if |
years |
(optional) a vector of years to limit the response to. This may
be useful to limit the size of returned pages in the event the |
geography |
(optional) a geographical breakdown. This can be given as a
character where it must be one of |
trim_zeros |
get rid of zero counts. Either "both" (from start and end), "leading" (from start only - the default) or "none". |
.progress |
by default a progress bar is shown, which may be important
if many downloads are needed to fulfil the request. It can be disabled
by setting this to |
a data frame with at least date (weekly), and one of count or
incidence columns. Most likely it will also have disease_name and
disease_code columns, and some of age_name, age_code, age_low,
age_high, geo_code, geo_name, disease_subtype_code,
disease_subtype_name depending on options. The dataframe will be grouped
to make sure each group contains a single timeseries.
# age stratified get_timeseries( diseases$`COVID-19`, measure = "Count", age_group = age_groups$children_coarse ) %>% dplyr::glimpse() # geographic get_timeseries( diseases$`COVID-19`, measure = "Count", geography = "state" ) %>% dplyr::glimpse() # disease stratified, subset of years: get_timeseries( measure = "Count", years = 2024 ) %>% dplyr::glimpse()# age stratified get_timeseries( diseases$`COVID-19`, measure = "Count", age_group = age_groups$children_coarse ) %>% dplyr::glimpse() # geographic get_timeseries( diseases$`COVID-19`, measure = "Count", geography = "state" ) %>% dplyr::glimpse() # disease stratified, subset of years: get_timeseries( measure = "Count", years = 2024 ) %>% dplyr::glimpse()
NutsKey71Map datasetThis matches the NutsKey71 dimension in SurvStat. This is the 38 NUTS2
level administrative regions in Germany.
data(NutsKey71Map)data(NutsKey71Map)
A sf dataframe containing the following columns:
Id - the full SurvStat identifier for this region (includes
hierarchical information)
ComponentId - the id of the most granular geographical unit (which can be
used to link out to other data sets)
HierarchyId - the id of the geographical unit type
Name - the name of the region
38 rows
rsurvstat cacheBy default successful requests to SurvStat are cached for 7 days to prevent
repeated querying of the service. This is stored in the usual R package cache
location by default (e.g. "~/.cache/rsurvstat" on mac / linux). Caching can
be switched off altogether.
set_cache_settings(..., active = NULL, dir = NULL, stale = NULL)set_cache_settings(..., active = NULL, dir = NULL, stale = NULL)
... |
you can also submit the settings as a named list. |
active |
boolean (optional), set to FALSE to disable caching |
dir |
file path (optional), the location of the cache |
stale |
numeric (optional), the number of days before a cached item is considered out of date |
the old cache settings as a list
old_settings = set_cache_settings(active = FALSE) set_cache_settings(old_settings)old_settings = set_cache_settings(active = FALSE) set_cache_settings(old_settings)