Introduction
Background
Related work
Our work and contributions
-
We leverage and build upon existing DP concepts and techniques and apply them to several common but distinct pandemic data types – surveillance data, case location data, and Contact Tracing Networks (CTNs) to demonstrate the publication of pandemic data with formal privacy guarantees. These three data types were routinely collected during the pandemic, provide different information on COVID-19, are distinct in terms of data structure and statistical analysis, and are all subject to privacy risks.
-
For case surveillance data, we use the flat Laplace sanitizer with DP guarantees and examine the statistical utility of log-linear models based on sanitized data in simulated data and real data published by the U.S. CDC. Our results suggest that simple approaches such as the flat Laplace sanitizer can be effective for releasing granular case surveillance data, providing a good balance between privacy and data utility.
-
For location data, we demonstrate the application of the planar Laplace mechanism with geo-indistinguishability guarantees to simulation data and a real South Korean case location dataset to examine inference from cluster point process models and the accuracy of hotspot heat maps based on sanitized locations. The method would be particularly useful for protecting location privacy when sharing information at a local level or releasing hotspot heat maps on a relatively fine scale.
-
For CTNs, we apply DP exponential random graph model (ERGM) to generate privacy-preserving synthetic networks and investigate the utility of sanitized networks in inference from ERGMs and the preservation of descriptive structural network statistics. The results suggest DP-ERGM is relatively insensitive to \(\epsilon\) and implies that small \(\epsilon\) can be used to provide strong privacy guarantees without sacrificing much of the utility.
-
Our study generates statistical evidence on the practical feasibility of sharing different types of pandemic data with formal privacy guarantees. The approaches examined in this study do not target learning individual-level information but focus on preserving aggregated and population-level information.
Preliminaries
Differential privacy
Geo-indistinguishability
Privacy-preserving statistical inference
Overview of case surveillance data, case location data, and contact tracing networks (CTNs)
Privacy-preserving case surveillance data release
Age (ys) | Race/Ethnicity | |||||||
---|---|---|---|---|---|---|---|---|
group | NH White | NH Black | NH AIAN | NH Asian | NH NHPI | NH Mix | Hispanic | Total |
<17 | 387 | 274 | 15 | 36 | 11 | 30 | 303 | 1056 |
18-29 | 2263 | 1492 | 187 | 190 | 49 | 73 | 2015 | 6269 |
30-39 | 6661 | 4144 | 560 | 558 | 151 | 157 | 5919 | 18150 |
40-49 | 17269 | 8937 | 1021 | 1206 | 265 | 309 | 13981 | 42988 |
50-64 | 97418 | 35753 | 3198 | 5312 | 715 | 952 | 43657 | 187005 |
65-74 | 141409 | 37765 | 2901 | 7423 | 501 | 913 | 38422 | 229334 |
>75 | 380630 | 54576 | 3210 | 16504 | 449 | 1380 | 56711 | 513460 |
Total | 646037 | 142941 | 11092 | 31229 | 2141 | 3814 | 161008 | 998262 |
Method
Simulation study
Application to CDC case surveillance data
Age (ys) | Race/Ethnicity | |||||||
---|---|---|---|---|---|---|---|---|
group | NH White | NH Black | NH AIAN | NH Asian | NH NHPI | NH Mix | Hispanic | Total |
<17 | 385 | 271 | 14 | 37 | 8 | 29 | 308 | 1052 |
18-29 | 2258 | 1491 | 186 | 198 | 49 | 72 | 2009 | 6263 |
30-39 | 6664 | 4140 | 562 | 558 | 145 | 156 | 5928 | 18153 |
40-49 | 17269 | 8937 | 1021 | 1202 | 266 | 299 | 13982 | 42976 |
50-64 | 97421 | 35753 | 3195 | 5311 | 713 | 952 | 43658 | 187003 |
65-74 | 141413 | 37766 | 2897 | 7427 | 501 | 914 | 38425 | 229343 |
>75 | 380642 | 54577 | 3209 | 16505 | 449 | 1379 | 56712 | 513472 |
Total | 646053 | 142935 | 11084 | 31238 | 2130 | 3801 | 161021 | 998262 |
Summary
Privacy-preserving release of case location data
Method
Simulation study
spatstat.core
[6].Metric | Parameter | Original | \(\epsilon =5\) | \(\epsilon =2\) | \(\epsilon =1\) | \(\epsilon =0.5\) |
---|---|---|---|---|---|---|
\(\beta _0\) | -0.029 | -0.022 | 0.016 | 0.142 | 0.571 | |
\(\beta _1\) | 0.065 | 0.052 | -0.022 | -0.279 | -1.180 | |
bias | \(\beta _2\) | 0.031 | 0.014 | -0.074 | -0.374 | -1.389 |
\(\beta _3\) | -0.085 | -0.077 | -0.028 | 0.154 | 0.801 | |
\(\beta _4\) | 0.034 | 0.038 | 0.060 | 0.124 | 0.337 | |
\(\beta _5\) | -0.037 | -0.024 | 0.048 | 0.303 | 1.160 | |
\(\beta _0\) | 0.466 | 0.465 | 0.459 | 0.457 | 0.680 | |
\(\beta _1\) | 1.234 | 1.232 | 1.211 | 1.189 | 1.549 | |
RMSE | \(\beta _2\) | 1.164 | 1.162 | 1.152 | 1.166 | 1.693 |
\(\beta _3\) | 1.006 | 1.003 | 0.986 | 0.958 | 1.159 | |
\(\beta _4\) | 0.944 | 0.943 | 0.934 | 0.898 | 0.838 | |
\(\beta _5\) | 0.985 | 0.982 | 0.972 | 0.989 | 1.431 | |
\(\beta _0\) | 0.948 | 0.940 | 0.925 | 0.841 | 0.599 | |
\(\beta _1\) | 0.938 | 0.932 | 0.914 | 0.845 | 0.719 | |
CP | \(\beta _2\) | 0.957 | 0.952 | 0.935 | 0.851 | 0.640 |
\(\beta _3\) | 0.938 | 0.929 | 0.909 | 0.842 | 0.769 | |
\(\beta _4\) | 0.941 | 0.934 | 0.908 | 0.840 | 0.878 | |
\(\beta _5\) | 0.947 | 0.939 | 0.916 | 0.827 | 0.638 |
Application to South Korea case location data
Estimate (95% CI) | |||||
---|---|---|---|---|---|
Original | \(\epsilon =5\) | \(\epsilon =2\) | \(\epsilon =1\) | \(\epsilon =0.5\) | |
\(\beta _0\) | -64.2 (-153.5, 25.1) | -65.1 (-157.0, 26.9) | -63.0 (-147.0, 21.0) | -63.8 (-140.6, 13.0) | -57.5 (-129.8, 14.7) |
\(\beta _1\) | 0.51 (-0.17, 1.19) | 0.52 (-0.18, 1.21) | 0.50 (-0.14, 1.14) | 0.50 (-0.08, 1.08) | 0.44 (-0.10, 0.99) |
\(\beta _2\) | 0.03 (-0.50, 0.56) | 0.03 (-0.52, 0.59) | 0.03 (-0.48, 0.54) | 0.05 (-0.42, 0.51) | 0.07 (-0.39, 0.53) |
Summary
Privacy-preserving sharing of contact tracing networks
Method
Simulation study
statnet
[29]. We conduct two utility analyses. In the first analysis, we examine the preservation of qualitative information and descriptive statistics in sanitized CTNs; in the second analysis, we run the ERGM on sanitized networks to examine the inference on the model parameter. m is set at 1 and 3, respectively, in these two analyses.Original | \(\epsilon =5\) | \(\epsilon =2\) | \(\epsilon =1\) | \(\epsilon =0.5\) | |
---|---|---|---|---|---|
bias | -0.021 | -0.021 | -0.026 | -0.031 | -0.051 |
RMSE | 0.171 | 0.172 | 0.174 | 0.187 | 0.260 |
CP | 0.942 | 0.954 | 0.954 | 0.952 | 0.944 |