Background
Methods
Datasets
Gold standard dataset
Record linkage with conventional personal identifiers
String comparison techniques applied to first and surnames | |||||||
---|---|---|---|---|---|---|---|
Exact | JW ≥ 0.7 | JW ≥ 0.9 | DM | Soundex | JW ≥ 0.9 or DM or soundex | ||
Identifiers used | Routinely collected identifiers* | S1 | S2 | S3 | S4 | S5 | S6 |
Routinely collected identifiers + household member first name | S7 | S8 | S9 | S10 | S11 | S12 | |
Routinely collected identifiers + household member first name and surname | S13 | S14 | S15 | ||||
Deterministic linkage on National ID Number or telephone number followed by best of S1-S15** | S16 | ||||||
S16 + clerical review of 5%, 10%, 15%, and 20% of record pairs above and below the threshold value above which record pairs are automatically accepted as matches | S17-S20 |
Bias in the record-linked dataset
Implementation
Ethical approval
Results
Identifier | Percentage of individuals with complete information | |
---|---|---|
From Agincourt HDSS (n = 93 507) | From Agincourt Health Centre (n = 10790) | |
First name | 100.00 | 100.00 |
Surname | 100.00 | 100.00 |
Other first name | 35.57 | 6.14 |
Sex | 100.00 | 99.95 |
Date of birth | 100.00 | 100.00 |
Village | 100.00 | 81.17 |
Household member first name | 98.48 | 77.29 |
Household member surname | 98.48 | 76.60 |
ID number | 67.14 | 1.55 |
Telephone number | 37.48 | 26.67 |
Variable | n | Linkage scenario 6 | Linkage scenario 16 | Linkage scenario 17 | ||||
---|---|---|---|---|---|---|---|---|
Matched | Multivariable | Matched | Multivariable | Matched | Multivariable | |||
n(%) | OR (95% CI) |
n(%) | OR (95% CI) |
n(%) | OR (95% CI) | |||
623 | 492 (79.0) | 551 (88.4) | 552 (88.6) | |||||
Sex
| ||||||||
Female | 511 | 395 (77.3) | 1 | 445 (87.1) | 1 | 447 (87.5) | 1 | |
Male | 112 | 97 (86.6) | 2.86 (1.41-5.82)* | 106 (94.6) | 4.38 (1.52-12.61)* | 105 (93.8) | 3.34 (1.25-8.97)* | |
Age
| ||||||||
18-34 | 334 | 284 (85.0) | 1 | 308 (92.2) | 1 | 308 (92.2) | 1 | |
35-49 | 125 | 100 (80.0) | 0.99 (0.53-1.84) | 112 (89.6) | 0.84 (0.36-1.93) | 115 (92.0) | 1.21 (0.5-2.92) | |
50-64 | 89 | 66 (74.2) | 0.76 (0.35-1.66) | 78 (87.6) | 0.75 (0.27-2.14) | 77 (86.5) | 0.75 (0.27-2.12) | |
65+ | 75 | 42 (56.0) | 0.35 (0.15-0.85)* | 53 (70.7) | 0.21 (0.07-0.63)* | 52 (69.3) | 0.25 (0.08-0.74)* | |
Ethnicity
| ||||||||
Other | 96 | 67 (70.0) | 1 | 76 (79.2) | 1 | 75 (78.1) | 1 | |
South African | 527 | 425 (80.7) | 1.3 (0.71-2.37) | 475 (90.1) | 1.82 (0.88-3.77) | 477 (90.5) | 2.1 (1.02-4.33)* | |
Residence status
| ||||||||
Permanent | 574 | 450 (78.4) | 1 | 506 (88.1) | 1 | 507 (88.3) | 1 | |
Temporary and other | 49 | 42 (85.7) | 1.63 (0.54-4.88) | 45 (91.8) | 1.28 (0.28-5.89) | 45 (91.8) | 1.4 (0.31-6.44) | |
Highest level of education
| ||||||||
None | 97 | 54 (55.7) | 1 | 71 (73.2) | 1 | 69 (71.1) | 1 | |
Some primary | 191 | 144 (75.4) | 1.46 (0.76-2.83) | 164 (85.8) | 1.16 (0.51-2.63) | 166 (87.0) | 1.43 (0.64-3.22) | |
Post primary | 302 | 267 (88.4) | 2.73 (1.18-6.36)* | 288 (95.4) | 2.62 (0.87-7.92) | 288 (95.4) | 3.05 (1.01-9.24)* | |
Employment
| ||||||||
Not working | 514 | 413 (80.4) | 1 | 462 (89.8) | 1 | 460 (89.5) | 1 | |
Working | 93 | 70 (75.3) | 0.68 (0.37-1.25) | 79 (85.0) | 0.53 (0.25-1.14) | 81 (87.1) | 0.71 (0.32-1.58) | |
Wealth quintile
| ||||||||
Lowest | 44 | 28 (63.6) | 1 | 33 (75.0) | 1 | 34 (77.3) | 1 | |
Second | 84 | 62 (73.8) | 1.48 (0.63-3.49) | 75 (89.3) | 2.42 (0.84-6.98) | 73 (90.0) | 1.63 (0.57-4.62) | |
Middle | 125 | 100 (80.0) | 1.89 (0.82-4.37) | 108 (86.4) | 1.60 (0.6-4.25) | 110 (88.0) | 1.58 (0.58-4.36) | |
Fourth | 172 | 136 (79.1) | 1.81 (0.8-4.11) | 152 (88.3) | 2.08 (0.78-5.54) | 150 (87.2) | 1.47 (0.55-3.93) | |
Highest | 184 | 159 (86.4) | 2.9 (1.24-6.75)* | 174 (94.5) | 4.4 (1.51-12.84)* | 175 (95.1) | 4.03 (1.34-12.17)* | |
Goodness-of-fit
| ||||||||
Pseudo R2, Wald χ
2 (p-value) | 0.11, 56.89 (<0.0001) | 0.16, 51.94 (<0.0001) | 0.16, 53.76 (<0.0001) |
Variable | Matched on fingerprint (n = 623) | Matched with scenario 6 (n = 492) | Matched with scenario 16 (n = 551) | Matched with scenario 17 (n = 552) | ||||
---|---|---|---|---|---|---|---|---|
n(%) |
n(%) |
p-value*
|
n(%) |
p-value*
|
n(%) |
p-value*
| ||
Sex
| ||||||||
Female | 511 (82.0) | 395 (80.3) | 445 (80.8) | 447 (81.0) | ||||
Male | 112 (18.0) | 97 (19.7) | 0.460 | 106 (19.2) | 0.579 | 105 (19.0) | 0.645 | |
Age
| ||||||||
18-34 | 334 (53.6) | 284 (57.7) | 308 (55.9) | 308 (55.8) | ||||
35-49 | 125 (20.1) | 100 (20.3 | 112 (20.3) | 115 (20.8) | ||||
50-64 | 89 (14.3) | 66 (13.4) | 78 (14.2) | 77 (14.0) | ||||
65+ | 75 (12.0) | 42 (8.5) | 0.240 | 53 (9.6) | 0.601 | 52 (9.4) | 0.528 | |
Ethnicity
| ||||||||
Other | 96 (15.4) | 67 (13.6) | 76 (13.8) | 75 (13.6) | ||||
South African | 527 (84.6) | 425 (86.4) | 0.401 | 475 (86.2) | 0.434 | 477 (86.4) | 0.377 | |
Residence status
| ||||||||
Permanent | 574 (92.1) | 450 (91.5) | 506 (91.8) | 507 (91.8) | ||||
Temporary and other | 48 (7.7) | 42 (8.5) | 0.595 | 45 (8.2) | 0.617 | 45 (8.2) | 0.618 | |
Highest level of education
| ||||||||
None | 97 (15.6) | 54 (11.0) | 71 (12.9) | 69 (12.5) | ||||
Some primary | 191 (30.7) | 144 (29.3) | 164 (29.8) | 166 (30.1) | ||||
Post primary | 302 (48.5) | 267 (54.3) | 0.098 | 288 (52.3) | 0.491 | 288 (52.2) | 0.426 | |
Employment
| ||||||||
Not working | 514 (82.5) | 413 (83.9) | 462 (83.4) | 460 (83.3) | ||||
Working | 93 (14.9) | 70 (14.2) | 0.660 | 79 (14.3) | 0.643 | 81 (14.7) | 0.795 | |
Wealth quintile
| ||||||||
Lowest | 44 (7.1) | 28 (5.7) | 33 (6.0) | 34 (16.2) | ||||
Second | 84 (13.5) | 62 (12.6) | 75 (13.6) | 73 (13.2) | ||||
Middle | 125 (20.1) | 100 (20.3) | 108 (19.6) | 110 (19.9) | ||||
Fourth | 172 (27.6) | 136 (27.6) | 152 (27.6) | 150 (21.2) | ||||
Highest | 184 (29.5) | 159 (32.3) | 0.753 | 174 (31.58) | 0.912 | 175 (31.7) | 0.952 |