Background
Polycystic ovary syndrome (PCOS) is a common endocrine disorder of unknown cause [
1]. Epidemiological studies have estimated a prevalence of 6.5 to 8% using biochemical and/or clinical evidence [
1] while studies involving ultrasonographic evidence of polycystic ovaries have reported a prevalence of 20% or more [
2]. PCOS is characteristically heterogeneous in its clinical presentation and therefore, much debate remains regarding consensus diagnostic criteria for the syndrome [
3]. Historically, the combination of androgen excess and oligo-amenorrhea has been considered the hallmark of PCOS by North American standards [
4]. By contrast, British and European standards have based the diagnosis primarily on ultrasonographic evidence of polycystic ovaries [
5]. Clarifying diagnostic criteria for PCOS has significant implications for the early identification and intervention of this condition. Early diagnosis and intervention is warranted since there is considerable evidence that women with PCOS are at increased risk for infertility, dysfunctional uterine bleeding, metabolic syndrome, type II diabetes and cardiovascular disease [
6,
7]. There is also growing evidence for increased risk of obstructive sleep apnea, depression, nonalcoholic fatty liver disease and certain cancers [
8‐
11].
In 2003, ultrasonographic evidence of polycystic ovaries was formally incorporated as a diagnostic marker of PCOS at a joint meeting of the European Society for Human Reproduction and Embryology (ESHRE) and the American Society for Reproductive Medicine (ASRM) [
6,
7]. Inclusion of an ovarian marker was based on substantial evidence that most women who presented with clinical and biochemical features of PCOS had polycystic ovaries on ultrasound [
12‐
14]. The current ultrasound guidelines supported by ESHRE/ASRM consensus characterize the polycystic ovary as containing 12 or more follicles measuring 2 – 9 mm and/or an increased ovarian volume of >10 cm
3 [
15]. Unlike the widely used criteria previously proposed by Adams and colleagues [
16], a subjective assessment of stromal echogenicity and follicle distribution pattern is not included. The cutoff value for an increased ovarian volume was derived from cumulative reports of a larger mean volume for polycystic ovaries compared to a mean volume of <10 cm
3 for normal ovaries [
17]. The cutoff of ≥12 follicles throughout the entire ovary, and not a single plane, was based on a report demonstrating this value to have 99% specificity and 75% sensitivity in distinguishing between polycystic and normal ovaries in women of reproductive age [
15].
While there is growing agreement that polycystic ovaries represent an important component of the clinical presentation of PCOS, it is important to acknowledge that significant inter- and intra-observer variability has been reported when making the ultrasound diagnosis [
18]. In an analysis of 54 ovarian scans in which images of 27 polycystic and normal ovaries were duplicated and randomized for post-hoc evaluation by four experienced observers, a diagnosis of polycystic ovarian morphology was agreed upon only 51% of the time while observers agreed with himself/herself only 69% of the time [
18]. In their study, Amer et al. defined the polycystic ovary as having ≥10 follicles (2 – 8 mm) in a single plane, an ovarian volume ≥12 cm
3 and a bright echogenic stroma. The high degree of variability in making the diagnosis suggested that the ultrasound criteria employed were either too subjective or too insensitive to allow for good agreement among observers [
17]. The extent to which any of the ultrasound criteria contributed to the subjectivity of the diagnosis was not assessed and to date, we are unaware of any other study that has attempted to further evaluate subjectivity in the ultrasound diagnosis of polycystic ovaries.
In the present study, we attempted to determine where discrepancies in the evaluation of polycystic ovaries might lie by determining the level of inter-observer agreement associated with the assessment of individual ultrasonographic aspects of polycystic ovarian morphology such as total follicle count, largest follicle diameter, ovarian volume, follicle distribution pattern and presence of a corpus luteum. Given past reports of significant variability in diagnosing polycystic ovaries, we hypothesized that agreement when evaluating ultrasonographic features of polycystic ovaries would be poor even among experienced medical imaging specialists with training in Radiology or Reproductive Endocrinology.
Discussion
Our results showed that despite reproductive endocrinologists demonstrating better agreement than radiologists when evaluating ultrasonographic features of polycystic ovaries, overall inter-observer agreement for both groups was only moderate to poor. In the case of counting the total number of follicles throughout the entire ovary, agreement was alarmingly poor. This was in contrast to past reports of good agreement when multiple observers counted follicles using both real-time and stored transvaginal ultrasonographic imaging [
26‐
28]. Good agreement in these studies was associated with counts that approximated 10 follicles per ovary [
26,
28]. In our current study, women diagnosed with PCOS by the ESHRE/ASRM criteria had counts that were generally in the order of 30 – 35 follicles. That we were counting more than three times as many follicles per ovary likely explains the lower levels of reliability reported by our group. The poor level of agreement for counting follicles may be interpreted to mean that follicle counts are too unreliable to be diagnostic. However, it is important to recognize that the current ultrasound guidelines only necessitate the ability to reliably count 12 follicles throughout the entire ovary [
15]. Our data showed that observers were consistent in identifying at least 12 follicles per ovary; yet we were interested in assessing the reliability of total follicle counts since recent studies have suggested that a significantly higher threshold than 12 is needed to adequately discriminate between polycystic and normal ovaries [
29]. Moreover, there is emerging evidence that ovarian morphology may reflect the degree of reproductive and metabolic disturbance in PCOS and therefore, give insight into the progression of the syndrome within an individual patient [
30]. Future studies aimed at improving reliability in follicle counts will be needed to verify the validity and applicability of this ultrasonographic endpoint in the evaluation of PCOS.
In contrast to follicle counts, agreement when calculating ovarian volume was fair. This observation was consistent with several studies reporting good agreement when multiple observers assessed ovarian volume by ultrasonography [
27,
31‐
34]. Better agreement when calculating ovarian volume suggests that this endpoint may serve as a more reliable marker of polycystic ovaries than follicle counts. Unfortunately, there is significant debate regarding the sensitivity of increased ovarian volume as a diagnostic criterion for polycystic ovaries. The currently accepted cutoff of >10 cm
3 was associated with 98.2% specificity, but only 45% sensitivity, in discriminating between normal and polycystic ovaries [
35]. Since 2003, both a lower threshold of 7 cm
3 [
35] and a higher threshold 13 cm
3 [
29] have been proposed as being more appropriate thresholds for polycystic ovarian morphology. Some of the controversy over a reliable diagnostic cut-off likely relates to inconsistent methods for determining ovarian volume. There is currently no consensus on the most suitable method of approximating ovarian volume. Clinicians and researchers use a myriad of techniques ranging from semi-automated volumetric task functions offered by conventional ultrasound systems to manual calculations using linear measurements made in multiple cross-sectional images. In the present study, we employed the equation for a prolate spheroid, rather than the commonly used equation of a prolate ellipsoid, since this method was found to correlate better with volume measurements of polycystic ovaries made by 3D ultrasound [
22].
Historically, the peripheral distribution of follicles has been considered a hallmark of polycystic ovaries [
16]. The classic "string of pearls" appearance is embedded in the Medical Imaging literature and remains highly remarked upon in radiological reports confirming the presence of polycystic ovarian morphology. In the current study, determination of follicle pattern among observers was poor. Difficulty assigning follicle pattern may have related to confusion over the most appropriate ovarian cross-section in which to make the determinations since observers were analyzing digital recording rather than static images. Moreover, there may have been reluctance to assign follicle pattern in the presence of a dominant follicle or CL. We were unable to find any study reporting specific reliability coefficients when assigning follicle pattern using static or dynamic transvaginal ultrasonography [
17]. While the current ultrasound criteria for polycystic ovaries exclude an assessment of follicle pattern, the appropriateness of its omission as a diagnostic criterion is questionable. Recently, a surrogate and more objective measure of follicle pattern, called the stromal-total area ratio, was shown to have 100% specificity and 100% sensitivity in diagnosing polycystic ovaries [
36]. This group also recently reported good reliability among observers when making calculations of the stromal-total area ratio [
37]. We suspect that wider adoption of this criterion may occur in light of favorable reports pertaining to its ease of use in clinical practice [
37].
Agreement in the identification of CL was good among observers. Disagreement among observers was generally noted only when a CL appeared as a cystic structure rather than a hyperechoic structure with a small to negligible fluid-filled cavity [
38]. In these instances, there was a tendency to mistake a CL for a dominant follicle (i.e., accounting for outlier measurements for the largest follicle diameter endpoint). Identifying the presence of CL is a highly important finding given its implications for infertility and risk of endometrial hyperplasia. However, it has been our experience that very few ultrasound reports comment on the presence or absence of a CL leading one to suspect that identification of ovulatory structures is not part of routine radiological assessments for many practices. While CL are generally present during the luteal phase, it is important to note that CL (albeit non-functional) can be visualized ultrasonographically during the early follicular phase [
38]. This coincides with the recommended time for the ultrasonographic evaluation of PCOS [
17]. Given growing recognition that some women with PCOS demonstrate regular menses, it is important to corroborate any evidence of ovulation to ascertain potentially lower health risks in this discrete subset of patients [
39].
While it is tempting to conclude that levels of agreement reported in this study were due to differences in experience (i.e., three of four observers were trainees), it is important to recognize that all observers were deemed experienced gynecological ultrasonography. In the case of the radiologists, both were senior Radiology residents that had fulfilled the ultrasonographic requirements for their training programs and were scheduled to enter general practice in less than a year. In the case of the reproductive endocrinologists, one was a gynecologist with more than twenty years of ultrasonography experience while the other was a fellow who at the time of the study had more than 18 months of intensive training in ovarian ultrasonography. Better agreement among reproductive endocrinologists could be due to the fact that both were working together at the same institution, in an area of study where there was greater likelihood of encountering polycystic ovarian morphology. Nevertheless, it should be noted that overall levels of agreement were highest among Observers 1 and 2 – a reproductive endocrinologist and a radiologist – suggesting that discipline alone cannot fully explain the disparity among groups. While Observer 3 may have lessened agreement among radiologists by undercounting follicles and overestimating follicle size, this observer's conservative approach surely represents a subset of Medical Imaging specialists that would interpret ultrasonographic images of polycystic ovaries in a similar fashion. Ultimately, this set of observers is representative of a real-life clinical setting.
In summary, inter-observer agreement for identifying and quantifying individual ultrasonographic features of polycystic ovaries was moderate to poor. Agreement was best for the identification of a CL followed by determination of ovarian volume, largest follicle diameter, follicle distribution pattern and lastly, total follicle count. While we recognize that not all of these features are used to diagnose polycystic ovaries, we believe each of these features should be evaluated at the time of ovarian ultrasonography since each relates to an important aspect of ovarian physiology. If ultrasonographic evidence of polycystic ovaries is to be used as an objective measure in the diagnosis of PCOS, then decreasing variability in the ultrasound diagnosis is crucial. Standardized training modules for the uniform acquisition and interpretation of ultrasonographic images may be a necessary first step toward improving reliability in identifying polycystic ovarian morphology.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
MEL conceived, designed and coordinated the study, performed the ultrasound scans, conducted the statistical analyses and drafted the final manuscript. DRC clinically evaluated the study volunteers for PCOS. DRC, AKP, AD and MEL performed the post-hoc sonographic evaluations. RAP participated in the conception and design of the study and provided resources and equipment to complete the study. All authors read and approved the final manuscript.