Sample size calculation
With a true prevalence of [1; 2; 5; 1%; 2%; 5%] and an expected sample size of
n = 2500 participants, a precision of the estimate of [±1; ±2; ±3; ±4; ±5; ±9] (i.e., the width of a 95% confidence interval) is to be achieved, which is considered sufficient. Note, on April 9th, 2021, SARS-CoV-2 prevalence of confirmed cases in Cologne was 3.9% [
20].
Analysis
The distributions of the collected data will be first described with the usual parameters of location and distribution, i.e. mean, standard deviation, percentiles (0., 25., 50., 75., 100.) for continuous variables, absolute and relative frequencies for qualitative variables.
Associations and correlations will be described by means of contingency tables and regression methods (e.g. logistic regression for dichotomous target variables). With regard to the prevalence of SARS-CoV-2, the following variables are of particular interest: age, gender, neighbourhoods, housing size, number of household members, pre-existing conditions, previous positive testing, potentially contagious contacts. Where possible, important statistical measures will be provided with 95% confidence intervals to indicate the precision of the estimate.
Due to the expected limited participation rate (we estimate 40%) studies of the representativeness of the sample obtained are of particular importance. On the basis of the information provided by the non-participants, an extrapolation to the Cologne population will be carried out. (Notabene: The risk of non-participation due to lack of reading and language skills should be minimized by the assistance of native speakers. Otherwise, a bias correction will be attempted as described below.) Official statistics on the total population of Cologne are available at
https://www.stadt-koeln.de/politik-und-verwaltung/statistik/ (accessed 22.01.2021). Sensitivity analyses examine the correction for various bias constellations. For this purpose, corresponding weights will be used in combination with regression methods.
The calculations will be performed using scripts for the programming languages R (R Foundation for Statistical Computing, Vienna, Austria), SAS (SAS Institute Corp., Cary, NC, USA), Stata (StataCorp LLC, College Station, TX, USA) and SPSS Statistics (IBM Corp., Armonk, NY, USA).