Introduction
Constipation is a common gastrointestinal complaint in children with a prevalence ranging from 0.77% to 29.6% both in Western and non-Western countries [
1]. The symptoms may vary from mild and short-lived to severe chronic constipation with faecal impaction and the involuntary loss of faeces. Medical history together with a thorough physical examination is generally sufficient for diagnosis and treatment of most children with constipation. However, many clinicians additionally order a plain abdominal radiograph to assess the presence of retained stool or enlargement of the distal gastrointestinal tract to confirm the diagnosis. Others use this test to evaluate severity of constipation, to evaluate treatment or to convince parents that constipation is the cause of their child’s complaints.
To date three scoring systems have been described to assess the severity of faecal loading using an abdominal radiograph in constipated children [
2‐
4]. These papers described a good diagnostic accuracy, with more than 80% of the constipated and non-constipated patients identified correctly. When evaluated by others however, accuracy was lower with an area under the curve (AUC) in the receiver operator characteristics of 0.68 for the Leech method [
5] 0.84 and 0.74, respectively, for the Barr and Blethyn scoring methods [
6]. Another important parameter for the usefulness of these methods, intraobserver and interobserver agreement, was good to excellent in the original description of these methods [
2‐
4]. Although some investigators could reproduce this for the Leech [
7] and Barr scores [
8], others could not, finding a much lower intra- and interobserver agreement [
5,
6,
9].
Three scoring systems were specifically designed for and evaluated in children [
2‐
4]. A fourth was only used in adults [
10]. As this Starreveld scoring system might be applicable in children as well, we assessed the accuracy of this method in the diagnosis of functional constipation in children, as well as its intra- and interobserver agreement. Furthermore, we compared the performance of the Starreveld score with the Barr score, the oldest and most widely used method for evaluating constipation on a plain abdominal radiograph.
Discussion
In this study we show that both the Starreveld and the Barr scoring method for assessing faecal loading on a plain abdominal radiograph are of limited value in the diagnosis of paediatric constipation. Although the Starreveld score performed better than the Barr score, diagnostic discrimination of both methods was poor.
This study was conducted using strict criteria for constipation as described by Loening-Baucke [
11]. For FAP and FNRFI the Rome II criteria were applied [
13]. Similar control groups have been used by others [
5]. However it cannot be excluded that in patients with functional abdominal pain and non-retentive faecal incontinence an overfilled colon is found more frequently than in the general population. A control group as used by Jackson et al. [
6], consisting of patients with trauma, ureteric colic, insertion of a ventriculo-peritoneal drain or nonspecific abdominal pain might have given a better representation of the “normal” population.
Our results in children differ from those obtained by Starreveld in adults. While in the original study scores given by the four individual observers were highly significantly correlated, we obtained only a moderate interobserver agreement [
10]. In addition, Starreveld described a significant correlation between the actual image as seen on the abdominal radiograph and defecation frequency. However, no controls were included, so the actual performance using a ROC curve could not be assessed. Our analysis actually showed a diagnostic accuracy which, with an AUC of 0.54, was only marginally above results that can be obtained by chance.
The other three scoring systems for evaluating constipation using an abdominal radiograph also had good sensitivity and specificity results in the original publications [
2‐
4]. However, when in a subsequent evaluation a ROC curve was obtained, the AUC of the Leech score did not exceed 0.68 [
5]. For the Barr and Blethyn scores the AUC obtained was 0.84 and 0.74 respectively, when scoring was done by an experienced radiologist, but lower when performed by a student or trainee [
6]. Interestingly, in our study more experience did not result in an improved AUC. The best AUC, 0.72 for the Starreveld and 0.63 for the Barr score, was obtained by the student. This AUC, which is still far from ideal, is similar to values obtained by others for the Leech, Blethyn and Barr scores [
5,
6].
In our study interobserver variability for both the Starreveld and Barr score was not good. Similar results were obtained by others for both Barr and Blethyn scores, although the Leech score performed unexpectedly well in another evaluation [
5‐
7]. However, we and others found a good agreement between the two evaluations of the same observer at different time points [
5,
7]. Obviously each observer develops their own interpretation of the original guidelines, resulting in considerable interobserver variability. However, each observer remains consistent in time given the acceptable intraobserver agreement.
Conclusion
The four scores developed for evaluating constipation using an abdominal radiograph did well on initial evaluation [
2‐
4,
10]. However, on subsequent independent evaluation, both in the current study and in others, these good initial results could not be repeated [
5,
6]. Given both the suboptimal AUC and the large interobserver variability the abdominal radiograph should not be part of the routine work-up of childhood constipation.