Methods for estimating areas under receiver-operating characteristic curves: illustration with somatic-cell scores in subclinical intramammary infections

https://doi.org/10.1016/S0167-5877(99)00054-9Get rights and content

Abstract

The aim of this study was to demonstrate receiver-operating characteristic (ROC) methodology in the context of bovine intramammary infection (IMI). Quarter somatic cell scores (SCS) were available to evaluate quarter IMI, and the final IMI diagnosis was made from milk bacteriologic cultures. Data consisted of 11,453 quarter-milk samples collected on 2084 clinically healthy cows located in 154 Belgian herds. Bacteriological analyses showed 16.2%, 7.2%, and 11.9% of quarters infected with coagulase-positive Staphylococcus spp., Streptococcus agalactiae, and coagulase-negative Staphylococcus spp., respectively. The ROC curve indicated all the combinations of sensitivity and specificity that quarter SCS was able to provide as a test to identify quarter IMI. Among parametric, semi-parametric, and non-parametric methods to estimate area under ROC curves, the parametric method seemed the least appropriate for analyzing SCS in this study. With the non-parametric method, the total area under the ROC curves showed quarter SCS could identify quarter IMI with an overall accuracy of 69%, 76%, and 59% for coagulase-positive Staphylococcus spp., S. agalactiae, and coagulase-negative Staphylococcus spp., respectively. Parametric and non-parametric statistical tests showed that overall SCS diagnostic capability was significantly (p<0.01) different from chance and was different (p<0.01) across the three bacteria. However, the SCS thresholds yielding the highest percentage of quarters correctly classified as infected (for the observed prevalence and for equal costs assigned to false-positive and false-negative results) were so high that they had no practical value. The major advantage of ROC analysis is the comprehensive description of the discrimination capacity of SCS for all possible choices of critical values. The major disadvantage is the dependency upon the gold standard used for the final diagnosis – but recent improvements of the methodology will correct the problem.

Introduction

The usefulness of somatic-cell counts (SCC) or somatic-cell scores (SCS) as indicator of intramammary infection (IMI) is widely recognized. Acute inflammation in the mammary gland results in recruitment of neutrophils into the damaged tissue site and increase in milk SCC (Craven and Williams, 1985). However, there is no universally accepted SCC threshold to discriminate between absence and presence of IMI. Proposed threshold values vary across studies due to differences in IMI prevalences, sample sizes, distributions of risk factors for IMI (e.g., age, parity, breed), IMI diagnostic methods, study designs (field or laboratory study), and methods (Paape and Contreras, 1996; Serieys, 1985). However, SCC diagnostic capability (that is, SCC ability for detecting whether or not there is IMI) may be assessed without having to commit to a single threshold with receiver-operating characteristic (ROC) curves. In human medicine, ROC curves are used so extensively to evaluate tests' diagnostic capability that the methodology has its own keyword classification in Index Medicus. However, only one previous research study used this methodology for evaluating SCC diagnostic capability in clinically ill cows (Pyörälä and Pyörälä, 1997).

An ROC curve is a plot of a test's true-positive fractions (TPF) versus false-positive fractions (FPF) for each possible test result (Hanley and McNeil, 1982). It indicates all tradeoffs between sensitivity and specificity that are available. The full and partial areas under the ROC curve (AUC) can be used to summarize the information. The full AUC summarizes the diagnostic capability of the test over the entire range of FPF. It is the average value of SCS sensitivity when SCS specificity is selected randomly from 0.0 to 1.0, or, equivalently, the average value of SCS specificity when SCS sensitivity is selected randomly from 0 to 1. A test with a full AUC of 1 is perfectly accurate for all possible cut-points of the test, whereas a test with a full AUC of 0.5 is performing no better than chance (Hanley and McNeil, 1982). The partial AUC is the area under the ROC curve restricted to a relevant portion. For example, a partial AUC can be the area for specificities greater than one specified value, for specificities lower than one specified value, or for specificities between two specified values (McClish, 1989; Zweig and Campbell, 1993). Indeed, two ROC curves may have different shapes over some restricted range of clinical interest and still have the same full AUC.

Three principal estimation methods are available to create ROC plots based on SCS and to estimate full and partial AUC's: parametric, semi-parametric, and non-parametric methods (Dorfman and Alf, 1969; Hanley and McNeil, 1982; Metz et al., 1990). Each method has advantages and disadvantages. Empirical or non-parametric methods involve plotting pairs of TPF versus FPF using empirical distributions (essentially, histograms) for the culture-positive and culture-negative distributions (Hanley and McNeil, 1982). Because it is free of structural assumptions, the non-parametric method is robust but the resulting ROC curve may be jagged, the AUC may be underestimated in comparison to the area under a smooth curve, and only observed sensitivities and specificities are compared. The most-popular parametric method assumes that SCC in culture-positive and culture-negative quarters are normally distributed with different means and variances (Dorfman and Alf, 1969). This approach yields smooth ROC curves going through all possible combinations of sensitivities and specificities, but it makes distributional assumptions and is more computationally demanding (Zweig and Campbell, 1993). In the semi-parametric method proposed by Metz et al. (1990), continuous data are grouped into ordered bins and then a smooth binormal ROC curve is created. This method is less sensitive to non-normality than the parametric method but possible lack-of-fit remain a disadvantage (Zou et al., 1997).

The objectives of this paper are to demonstrate ROC methodology regarding the computation of full and partial AUC's. Quarter SCS in clinically healthy quarters infected with three different bacteriological species or uninfected were used to demonstrate how to obtain parametric, semi-parametric, and non-parametric estimates of full and partial AUC's, how to use parametric and non-parametric approaches to test significance of diagnostic capability, and how to obtain cut-off values from ROC curves.

Section snippets

Data for the example

From August 3, 1983 to May 5, 1989 a program of mastitis control was established by the Milk Committee in the Luxembourg province in Belgium. Herds volunteered to participate in the survey. Quarter-milk samples were taken on all four quarters before morning milking, either from all lactating cows or from a representative sample of cows in the herd. Only cows without clinical signs of mastitis were used in the study.

Bacteriological analyses were carried out as described (Arendt et al., 1997). In

Results

A total of 11,453 quarter-milk samples was collected from 2,084 cows in 154 commercial herds. Descriptive statistics are shown in Table 1. Mean SCS was highest (at 5.2) for quarters infected with S. agalactiae. Of 41.8% culture-positive quarters, 36% were infected with CPS. In 64% of healthy quarters, SCC was less than 100,000 cells/ml.

Distributions of SCS are shown for culture-negative and culture-positive quarters in Fig. 2. Using the Kolmogorov D statistic test for goodness-of-fit to the

Discussion

Parametric, semi-parametric, and non-parametric estimates of the AUC were similar (Table 2) with a tendency for non-parametric estimates to be higher than semi-parametric and parametric estimates. In a previous simulation study with different configurations of pairs of overlapping distributions, biases were very small for both semi-parametric and non-parametric estimates (Hajian-Tilaki et al., 1997). Semi-parametric estimates are more robust with respect to deviation from the binormal

Conclusions

Of all measures proposed to evaluate SCS as indicator of IMI, ROC measures have the great advantage of providing the most comprehensive description of SCS accuracy. The ROC measures indicate all combinations of sensitivity and specificity of SCS that can serve as tests to identify IMI. It is thus preferable to using a single critical value which may lead to considerable confusion when different criteria are used in different studies. Among parametric, semi-parametric, and non-parametric

Acknowledgements

We thank Dr. Burvenich from the Veterinary Faculty of Ghent, Dr. Gröhn from the Veterinary Faculty of Cornell, and Dr. Kehrli from the National Animal Disease Center of Ames for their helpful comments and suggestions.

References (28)

  • K.O. Hajian-Tilaki et al.

    A comparison of parametric and nonparametric approaches to ROC analyses of quantitative diagnostic tests

    Med. Decis. Making

    (1997)
  • J.A. Hanley et al.

    The meaning and use of area under a receiver operating characteristics (ROC) curve

    Radiology

    (1982)
  • Hollander, M., Wolfe, D.A., 1973. Nonparametric Statistical Methods. Wiley, New York, pp....
  • G.M. Jones et al.

    Relationship between somatic cell counts and milk production

    J. Dairy Sci.

    (1983)
  • Cited by (31)

    • Relationship between intramammary infection prevalence and somatic cell score in commercial dairy herds

      2017, Journal of Dairy Science
      Citation Excerpt :

      One interpretation of AUC is average sensitivity over the entire range of false positive rate. Also, AUC can be interpreted as the probability that a randomly selected diseased cow has a greater test value than a randomly selected healthy cow (Detilleux et al., 1999; Gardner and Greiner, 2006; Dohoo et al., 2009). Differences in AUC between corresponding final and simplified models were small (Table 7), indicating similar diagnostic performance of the models and implying negligible loss of information in the simplified models.

    • Evaluation of the composite milk somatic cell count as a predictor of intramammary infection in dairy cattle

      2016, Journal of Dairy Science
      Citation Excerpt :

      The objectives of this study were (1) to determine the Se, Sp, PPV, and negative predictive value (NPV) of several quarter-cSCC values based on either a single observation or the geometric mean of multiple recordings to differentiate between uninfected and infected cows in Flanders Belgium; and (2) to explore to what extent factors such as the herd prevalence of IMI, parity, and stage of lactation affect the Se, Sp, PPV, and NPV of quarter-cSCC at different thresholds. The best threshold was selected using the receiver operating characteristic (ROC) methodology as done by Detilleux et al. (1999); Se and Sp were assumed to be equally important in identifying cows with IMI causing subclinical mastitis. Data from 3 longitudinal studies conducted in Flanders (Belgium) were combined into 1 data set.

    • Applying additive logistic regression to data derived from sensors monitoring behavioral and physiological characteristics of dairy cows to detect lameness

      2013, Journal of Dairy Science
      Citation Excerpt :

      Values for the AUC range between 0.5 and 1: when a model produces probability estimates at random, it will have an AUC of 0.5, whereas a perfect ranking (i.e., all lame cows receive a higher probability estimate for lameness than the nonlame cows) will yield an AUC of 1 (Swets, 1988; Cortes and Mohri, 2005). Two ROC curves may have different shapes within certain threshold value limits but still have the same AUC value (Detilleux et al., 1999). This makes comparison of models’ performances using AUC difficult.

    • Receiver-operating characteristic curves for somatic cell scores and California mastitis test in Valle del Belice dairy sheep

      2013, Veterinary Journal
      Citation Excerpt :

      Not surprisingly, the assumption of normality was violated after standard transformations. Consistent with its independence from the normality of the input data, the semi-parametric approach produced results similar to the non-parametric analysis with an observed tendency for non-parametric estimates for SCS to be higher than semi-parametric estimates, as reported by Detilleux et al. (1999). CMT showed a slightly higher value for the semi-parametric estimate, consistent with the former having very few categories (Pesce et al., 2010).

    View all citing articles on Scopus
    View full text