Measuring diagnostic accuracy where there is no "gold standard"
The performance of a diagnostic test is judged by how accurately the test result can identify a diseased or no diseased person. The true disease status is the "gold standard" against which a test should be compared. However, there are many conditions for which the definitive diagnosis is very difficult or expensive to establish. This is especially true for the diagnosis of a complex clinical condition as sepsis, in which even within the construct of "systemic response to infection" there is not a real "gold standard" against which the diagnostic criteria can be calibrated [
18].
Psychological and social sciences have a long tradition in coping with primary study objects that are not directly observable. Constructs such as intelligence, fear or trust can only be measured indirectly. Inference proceeds by modeling the relationship between observable and latent variables in such a way that the parameters of interest are estimable from the implied relations between observable variables. When the unobservable variable is categorical, the term latent class analysis (LCA) applies [
6]. In other words, LCA postulates the existence of an unobserved categorical variable that divides the population of interest into classes. Members of the population with a set of observed variables will respond differently depending on the latent class to which they belong. This technique can be applied to the problems related to diagnostic testing, with the unobserved categorical variable being "disease present" or "disease absent" [
44]. The observed variables might typically be the results of three or more diagnostic tests, none of them being a gold standard. LCA can then be applied to estimate the proportions of patients in each latent class (that is, estimated to be diseased or free of disease), and the sensitivity and specificity of each diagnostic test. In summary, the goal of latent class analysis is to use the observed probabilities to estimate the unobserved ones.
The methodology for latent class analysis is one of the most active areas of biostatistics research and development in the last years [
44‐
50], and specialized software is available for estimation procedures [
10,
11]. Considering the importance to have a diagnostic test that got a minimal proportion of false negative results, we have considered the sensitivity estimation to calculate the sample size. An adequate precision of the sensibility value gives enough power to detect significant values in the other operative characteristics. The number of the patients with the disease (NP) that is needed to give a sensibility estimation of 95%, with a 95% confidence interval +/- 3% is calculated with the following formula [
51]:
The NP (true positives and false negatives) is also determined by the prevalence (P) of the disease in the study population. Hence, the total of patients (TP) required for this research is a function of these two amounts:
According with the inclusion and exclusion criteria defined for our study population, the expected prevalence of microbiological confirmed sepsis is about 30% [
3,
15]. Therefore, the total participants we should recruited for an adequate sample size is 677 patients. Considering the necessity to carry out a pilot test to standardize PCT measurement, the total of patients recruited will be 700.
The cut points for the study tests (CRP, PCT, and DD) will be explored using Receptor Operative Characteristics (ROC) curves [
52], using as classification criteria the gold standard clinical tests (presence of infection defined according with CDC modified criteria, and sepsis diagnosis defined as clinical consensus), but also using the presence of severe sepsis (organ dysfunction) and mortality. This will allow defining the cut points with the best sensibility without compromising the specificity significantly, but also to define useful values in clinical decisions.
A conventional method based in Bayes Theorem will be used to determine the operative characteristics of the tests and their different combinations against the referred gold standards [
53]. 95% confidence intervals will be estimated for values of sensitivity, specificity, predictive values and likelihood ratios. STATA software (Stata Co, release 8.2, College Station, TX, USA, 2004) will be used for all analysis.
LCA is a statistical method developed to find subtypes of related cases (latent classes) that are inherent and implicit within multivariable categorical data [
6]. A particularly application of the LCA is the evaluation of the accuracy of diagnostic tests when there is a lack of a "gold standard". In presence of at least three tests -CRP, PCT, DD- that can detect presence or absence of an illness, but without any of them that can determine certainly the condition; the LCA could be used to estimate the diagnostic accuracy of these tests. The traditional LCA assumes that results from the three tests in the same subject are independent within the real condition of illness [
44]. In other words, the conditional or local independence assumption affirms that inside each latent class (sepsis or no sepsis); each result of a test is statistically independent of the result of the other one. If the effect to belong to a latent condition of sepsis would be removed, the effects to the CRP, PCT and DD would have a completely random distribution in the study population. However, in many clinical situations this independence assumption is less likelihood and in some cases is extremely difficult to verify. In our study it is probably that PCT and CRP values are related directly each other within an inflammation process.
This local independence assumption can be relaxed or controlled introducing a random effect through a continuous latent variable [
47]. In the LCA with random effects, it is assumed that the result of diagnostic tests is controlled by two mechanisms or factors. The first one is the real condition of the illness (
δ) and the second one is the biological individual process in the patient or the technical characteristics of the test. In this regard, the model introduces another latent variable (t) that summarizes or represents the subject or the diagnostic test attributes that are not explained by the real condition of the illness. Hence, it is assumed that the results of the different diagnostic tests are independent, conditionals in
δ and t. Similarly, it is assumed that
t is distributed according to a standard normal distribution and that the probability to have a positive result from the test (Pr Y = 1), given
δ, is a monotonic function of
t, this is represented by the following equation:
Pr(Y
i
= 1|δ, t) = Φ(a
iδ
+ b
δ
t)
In this equation, "Φ", represents the function of the accumulated density of the normal distribution, and the
a and
b terms are the parameters. The positive rate of the test, conditionally in
δ, is simply the mean value over
t. The estimators of maximum verisimilitude for sensibility and specificity of each test could be obtained with an integral that uses an iteration algorithm such as the EM or the Newton-Rapson method [
46,
47]. All the previous analysis and convergence procedures to the latent class estimation will be carried out with the statistical software Latent GOLD 4.0 (Statistical Innovations, Belmont, MA, USA)
The existence of two clinical "gold standards", as previously described, will allow us to compare them against the LCA results. In this way it is possible to analyze the differences and similarities when sepsis diagnosis is defined by LCA comparing by clinical consensus or simple infection criteria. Thus, the strong biological assumption of a crosstalk between inflammation and coagulation in sepsis, and a sensible mathematical model of the latent diagnostic classification, provide a unique opportunity to understand a relevant clinical and public health problem.