Background
Diagnostic criteria and management procedures for polycystic ovary syndrome (PCOS) are highly controversial and hotly debated in the literature [
1,
2]. The use of ultrasonography in the diagnosis of PCOS is one such point of contention [
3]. In 1990, the first attempt by experts to generate international consensus criteria for PCOS resulted in exclusion of polycystic ovaries as a potential marker of the syndrome [
4]. While these criteria were heralded as a legitimate first step toward characterizing the clinical spectrum of PCOS, they did override standards and practices employed in the UK and most of Europe where the diagnosis had long been based on ultrasonography [
3]. At the time, evidence of polycystic ovaries was considered "suggestive" and not diagnostic of PCOS since there were numerous reports of polycystic ovarian morphology in normal asymptomatic women (up to 30%) and in conditions other than PCOS such as normal to late puberty, hyperprolactinemia and congenital adrenal hyperplasia [
5‐
9].
In the years that followed, it became apparent that polycystic ovaries were in fact, a consistent finding in women demonstrating biochemical and clinical evidence of PCOS [
10‐
13]. Moreover, it was discovered that asymptomatic women with polycystic ovaries demonstrated subtle endocrine and metabolic abnormalities [
3,
6,
11]. In 2003, ultrasonographic evidence of polycystic ovaries was incorporated as a third diagnostic marker of PCOS at a joint-meeting of the European Society for Human Reproduction and Embryology (ESHRE) and American Society for Reproductive Medicine (ASRM) [
1]. Revisions to the consensus criteria were intended to broaden the clinical spectrum of PCOS and therefore allowed for a diagnosis based on identification of
two of three criteria: 1) oligo- or chronic anovulation, 2) clinical and/or biochemical hyperandrogenism and 3) polycystic ovaries on ultrasonography. While there is concern that these criteria are too expansive [
2], they do reflect majority opinion that polycystic ovaries are a significant component of PCOS.
The current ultrasound guidelines supported by ESHRE/ASRM define the polycystic ovary as having 12 or more follicles measuring 2 – 9 mm and/or an increased ovarian volume of >10 cm
3 [
1]. Unlike previous definitions, no assessment of stromal echotexture or follicle distribution pattern is necessary [
5]. The cutoff value for ovarian volume is based on cumulative reports of mean volumes >10 cm
3 for polycystic ovaries [
14] while the cutoff value of 12 follicles
throughout the entire ovary was shown to have 99% specificity and 75% sensitivity in distinguishing between polycystic and normal ovaries [
15,
16]. At present, the reproducibility of these values has not been reported nor has the level of variability associated with the evaluation of these criteria been established. We are aware of only one study in which observer variation in the ultrasound diagnosis of polycystic ovaries has been assessed [
17]. Amer
et al. showed that when the polycystic ovary was defined as having ≥ 10 follicles, an ovarian volume ≥ 12 cm
3 and a bright echogenic stroma, a diagnosis was agreed upon among observers only 51% of the time while observers agreed with themselves only 69% of the time [
17]. Significant variability when making the diagnosis suggested that the criteria employed were either too subjective or too insensitive to allow for good agreement [
14]. Unfortunately, the extent to which any of these features contributed to the subjectivity of the diagnosis was not evaluated.
We recently attempted to determine where variability in the ultrasound diagnosis might lie by determining the level of inter-observer variability associated with identifying and quantifying individual ultrasonographic features of polycystic ovaries (e.g., total follicle count, ovarian volume, etc.) [
18]. In our previous study, overall agreement among radiologists and reproductive endocrinologists was only moderate to poor. We learned that observers varied significantly in their approach to analyzing each ultrasonographic feature and this accounted for discrepancies in agreement. Differences in technique were mostly related to differences in training among medical disciplines and learning institutions. The primary objective of the current study was to determine if a training workshop could reduce inter-observer variation when evaluating ultrasonographic features of polycystic ovaries. We hypothesized that agreement among observers could be vastly improved following the review of relevant Acoustic Physics principles, ovarian ultrasound image acquisition and interpretation, and a detailed analysis of ovarian structures by
in vitro water bath scanning.
Discussion
Our objective was to determine the effect of an ultrasound training workshop on the inter-observer variability associated with evaluating ultrasonographic features of polycystic ovaries. Our approach involved the assessment of transvaginal ultrasound recordings of 30 polycystic ovaries for six features, by six observers with training in either Radiology or Reproductive Endocrinology, both before and after an ultrasound workshop. The use of ultrasonographic recordings for the determination of inter-observer variability is a commonly used and highly feasible approach [
17]. It involved volunteers undergoing only one endovaginal scan and avoided the presence of numerous observers in the ultrasound suite which may be considered intrusive or embarrassing for the study participant [
17]. Furthermore, it mimicked practices in Radiology where digital images/recordings captured by trained sonographers are presented to radiologists for post-hoc evaluation. Had it been practical to have six observers perform their own scans, we suspect that differences in training, technique and experience at the time of image acquisition would have further compounded the level of variability reported among observers.
Observers were given minimal instructions prior to their initial analysis of the images in hopes that each observer would best use his or her own skill-set in the assessments. Agreement was initially poor for FNPO, FNPS and follicle distribution pattern, moderate for largest follicle diameter and ovarian volume, and good for identification of a CL. These findings were consistent with our previous study in which four observers analyzed transvaginal ultrasound recordings of polycystic ovaries for similar morphologic endpoints [
18]. The results of our current study showed that inter-observer agreement could be significantly improved when observers participated in a workshop focused on evaluating ovarian morphology. Discussion among radiologists and reproductive endocrinologists which culminated in the formation of consensus guidelines for assessing ultrasonographic features was the primary factor responsible for improved agreement following the ultrasound workshop.
The current ESHRE/ASRM recommendations for the ultrasonographic evaluation of polycystic ovaries state that ultrasound scans (preferably, transvaginal) be performed during the early follicular phase (i.e., days 3 – 5) or three to five days following a hormonally-induced bleed in women with chronic anovulation [
1]. This recommended time of ultrasonography corresponds to a time during the natural menstrual cycle in which follicle population is dramatically increasing, yet maximum follicle diameters are generally <10 mm [
27]. Despite these recommendations, many women still present for ultrasonographic evaluation at random times during their menstrual cycle. We felt it instructive to mimic real-life situations by having participants present for their ultrasounds at random such that observers would encounter multiple follicle sizes and/or ovulation glands when assessing polycystic ovarian morphology. While we recognize that not all the ultrasonographic endpoints assessed are used to diagnose polycystic ovaries, each of these features is routinely evaluated at the time of ovarian ultrasonography since each gives important information regarding ovarian function/dysfunction.
Poor agreement in the evaluation of FNPO demonstrated by this current study was in contrast to past reports of very good inter-observer agreement in total antral follicle counts (2 – 10 mm) using real-time or stored 2D and 3D transvaginal ultrasonographic imaging [
28,
29]. In previous studies, good agreement among observers was associated with antral follicle counts in the order of approximately 10 follicles per ovary [
28,
29]. However, it should be noted that both groups also reported a distinct decrease in inter-observer agreement when follicle counts were greater than 15 [
28,
29]. In the present study, follicle counts in women diagnosed with PCOS by the ESHRE/ASRM criteria generally ranged from 35 – 40. That we were counting more than three times as many follicles per ovary may have accounted for differences in agreement between studies. That there were fewer follicles to count in a single cross-section (i.e., ~12 follicles), may have also accounted for the better agreement levels reported for FNPS compared to FNPO.
Difficulty in counting follicles lay in the high degree of crowding that occurred among adjacent follicles. In having performed an ultrasono-histopathological assessment of bovine ovaries, it was apparent that follicles could appear as either round or irregular in shape due to atresia or compression by adjacent structures. It was also evident that follicle clustering caused adjacent follicular walls to be imperceptible on ultrasonography. Discriminating among adjacent follicles would therefore, depend on the observer's perception of the number of lobulations present among a collection of cysts rather than the identification of septa between follicles. With these points in mind, significant improvement in the assessment of FNPS was evident following the ultrasound workshop. Improvement resulted even though mean FNPS counts were significantly higher than those reported before the ultrasound workshop. The higher follicle counts likely reflected an improved awareness of what actually constituted an ovarian follicle on ultrasonography.
An increase in mean follicle counts was also reported for the assessment of FNPO following the ultrasound workshop. However, unlike the FNPS endpoint, inter-observer agreement was not improved for FNPO. Failure to improve agreement in FNPO may be interpreted to mean that the level of subjectivity associated with counting follicles throughout the entire ovary is insurmountable. The current ultrasound recommendations argue that the ability to reliably count 12 follicles is sufficient to ensure an accurate diagnosis. Our study supports the notion that multiple observers can agree on counting at least 12 follicles per ovary. However, there is merging evidence that total follicle population relates to degree of symptomology and therefore, health risks for women with PCOS [
30]. Ascertaining the clinical relevance of discrete aspects of ovarian morphology may therefore help identify persons at risk for PCOS and/or progression of the syndrome.
Differences in ovarian volume measurements before the ultrasound workshop were related to differences in measurement technique among observers. Some observers would measure the widest and longest orthogonal planes of the ovary while others would measure the longest plane first and then draw their width measurement such that it bisected the longitudinal plane at a right angle (i.e., this may or may not have represented the widest plane of the ovary). After agreeing to uniformly measure only the longest and widest orthogonal planes, ovarian volume measurements were significantly greater following the workshop and inter-observer agreement for ovarian volume proved excellent. This was consistent with several studies reporting good inter-observer agreement in the ultrasound assessment of ovarian volume [
31‐
34]. The subjectivity associated with counting follicles may be interpreted to suggest that calculation of ovarian volume should represent the primary method of diagnosing polycystic ovaries. Unfortunately, there are limitations to determining ovarian volume that must be acknowledged. For example, accurate measurements of ovarian volume can only be made during the early follicular phase when there is generally no dominant follicle (>10 mm) or cystic CL to overestimate the size of the ovary. Cutoff levels for increased ovarian volume in polycystic ovaries are debatable since there is significant overlap with the normal population [
35]. Also, it is important to note that not all polycystic ovaries will be enlarged despite demonstrating a grossly elevated follicle count [
14]. More importantly, there is no universally accepted method of calculating ovarian volume [
14]. In this study, we employed the equation for a prolate spheroid to calculate ovarian volume rather than the equation for a prolate ellipsoid which is recommended by the ESHRE/ASRM consensus. The equation for a prolate spheroid was found to correlate better with volume measurements of polycystic ovaries made by 3D-ultrasonography than the formula for a prolate ellipsoid [
21].
The most improvement in agreement was seen for the evaluation of follicle distribution pattern. Before the workshop, many observers expressed clear frustration and reluctance to assign a distribution pattern since digital sweeps through the ovary would often show both even and peripheral follicle distribution patterns depending on what portion of the ovary was represented. After discussion, it was concluded that the designation follicle pattern should occur at the single largest cross-sectional view of the ovary in keeping with previous definitions of polycystic ovarian morphology [
5]. That is, observers would now scroll to the digital frame that represented the largest cross-sectional area of the ovary and decide on both follicle pattern and measurements of the widest and longest diameters of the ovary (i.e., for measurement of ovarian volume) using that individual frame. It was decided that in instances where a preovulatory follicle or a cystic CL was present in the largest plane a designation of 'other' be made. Inter-observer agreement for evaluation of follicle distribution pattern became exceptional following this consensus approach.
The presence of an ovulation gland is a highly important finding to report in women with PCOS given its implications for fertility and risk of endometrial hyperplasia. While CL are typically identifiable during the luteal phase, it is important to recognize that CL, albeit non-functional, may be visualized ultrasonographically during the early follicular phase coinciding with the recommended time for evaluation of PCOS [
36]. Following the ultrasound workshop, agreement in the identification of a CL was good among virtually all pairs of observers. Disagreement among observers was typically noted only when a CL appeared as a cystic structure versus when it appeared as a hyperechoic structure with a small to negligible fluid-filled cavity [
36]. In these instances, there was a tendency to mistake a cystic CL for a dominant preovulatory follicle. Mistaking a CL for a large follicle also accounted for outlier measurements of the largest follicle diameter. Clues recognized by the observers as being helpful in distinguishing between CL and preovulatory follicles included the presence of a floccuent cystic cavity and/or crenulated hyperechoic postovulatory follicular walls which are apparent only in CL. Had observers performed their own ultrasound scans, the use of Doppler in real-time could have facilitated the identification of CL and likely improved agreement among observers for this endpoint [
36].
In summary, variability in the ultrasound diagnosis of polycystic ovaries likely reflects poor to moderate inter-observer agreement when identifying and quantifying individual characteristics of polycystic ovaries. Agreement in assessing ultrasonographic features of polycystic ovaries can be significantly improved when evaluators generate consensus guidelines for assessing ultrasonographic endpoints. Our study supports the notion that standardized training modules for characterizing polycystic ovarian morphology are needed if ultrasonographic evidence of polycystic ovaries is to be used as an objective measure in the diagnosis of PCOS. Also, that collaboration and communication among imaging specialists in different medical disciplines is necessary for generating a truly consensual approach. Developing reliable and unified methods for the acquisition and interpretation of ultrasonographic images of polycystic ovaries is critical for ensuring the timely identification and intervention of PCOS.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
MEL conceived, designed and coordinated the study, performed the ultrasound scans, conducted the statistical analyses and drafted the final manuscript. DRC clinically evaluated the study volunteers for PCOS. DRC, AKP, SK, DAL, TGB and MEL participated in the ultrasound workshop and generated consensus criteria for the sonographic evaluation of polycystic ovaries. AKP aided in the statistical analyses and helped draft the manuscript. RAP participated in the conception and design of the study and provided resources and equipment to complete the study. All authors read and approved the final manuscript.