Introduction

Spinal cord injury (SCI) has a profound impact on bowel function. Anorectal sensibility and voluntary anal sphincter contraction is reduced or lost and colorectal transit times are usually prolonged.1, 2, 3 Most individuals with SCI suffer from combinations of fecal incontinence and constipation, often with severe consequences for quality of life.4, 5, 6 Several novel treatment modalities have been introduced within the last decade. However, a Cochrane review concluded that management of neurogenic bowel dysfunction (NBD) must remain empirical until well-designed controlled trials with adequate numbers and clinically relevant outcome measures become available.7 Such studies require valid instruments for the collection of data.

The International SCI Bowel Function Basic Data Set has been developed to ensure collection of clinically relevant information in a standardized form. Furthermore, the International SCI Bowel Function Extended Data Set has been developed to obtain more detailed information and facilitate comparison of results from scientific studies. Today, data sets have been developed and published for upper and lower urinary tract function, urinary tract imaging,8, 9, 10 bowel11, 12 and cardiovascular function13 and pain.14 Data sets for sexual function and quality of life have also been developed and are available at http://www.iscos.org.uk.

The International SCI Bowel Function Data Sets, developed by a working group of experts appointed by American Spinal Injury Association and International Spinal Cord Society (ISCoS), were published in 2009.11, 12 The International SCI Bowel Function Basic Data Set consists of 12 items and the International SCI Bowel Function Extended Data Set of 26 items. The combined data sets contain information for computation of the Cleveland Constipation Score,15 Wexner Fecal Incontinence Score16 and NBD Score.17 Detailed guidelines have been developed to ensure a common interpretation of the data sets, but their reliability remains to be evaluated. The data sets are intended for international use and, accordingly, reliability should be tested in an international setting.

The primary aim of the present study was to test the inter-rater reliability of the International Bowel Function Basic and Extended Data Sets as recommended by the executive committee for the International SCI Standards and Data Sets.18 A secondary aim was to assess the inter-rater reliability of the Cleveland Constipation Score, the Wexner Fecal Incontinence Score and the NBD Score in subjects with SCI.

Subjects and methods

Participants

Spinal cord injury centers in Imola, Italy; Stoke Mandeville, UK and Viborg, Denmark participated in the study. Each center contributed with two raters and 24, 24 and 25 patients, respectively. The raters were doctors or nurses experienced in the treatment of SCI and NBD.

Inclusion criteria were: age older than 18 years, SCI of at least 3 months duration, sufficient mental capacity to cooperate with data collection, stable bowel function for at least 2 weeks before the first test and for the period between the two tests, that is, regular bowel pattern, unchanged use of oral laxatives and unchanged emptying routine.

Procedure

Data collection was performed from January to October 2010. The inter-rater reliability was assessed by having two raters at each center complete both data sets independently on the same patient with an interval of 14 days between the tests. This time interval was chosen as a compromise between a time period long enough to minimize the risk that the participants would remember the answers of the first test and short enough to minimize the risk of changes in bowel function. The data sets were completed by the raters during structured interviews with the patient. This was followed by digital anorectal examination.

Raters were instructed to perform approximately the same number of first and second tests. For practical purpose, no fixed order of tests was assigned. The rater of the second test was blinded to the results of the first test. In addition the International Spinal Cord Injury Core Data Set was completed by the rater of the first test.19

The raters consecutively included patients fulfilling the inclusion criteria. Immediately after completion, the data sets were mailed to the primary investigator to monitor the completeness of data collection and to ensure results from the first test were unknown to the second rater. Raters were instructed not to discuss the interpretation of items and response categories during the data collection period. The raters had no previous experience with the International SCI Bowel Function Data Sets and they did not undergo any specific training. They were, however, encouraged to consult the guidelines on the ISCoS website whenever in doubt. Interviews and examinations took place at the SCI centers.

Radiographically determined colorectal transit time is included in the International SCI Bowel Function Extended Data Set. The reproducibility and inter-rater reliability of colonic transit time in subjects with SCI has been evaluated in a previous study.20 Hence, colonic transit time was not included in the present study.

The study was performed according to the Helsinki II declaration. The participating centers obtained ethics approval according to the national regulation in their respective countries. Informed consent was obtained from each patient.

Translation

The original English data sets were initially translated into Italian and Danish. The translations were performed by two bilingual health professionals, experts in SCI, whose mother tongue was the target language, that is, Italian and Danish, respectively. The translations were aimed at conceptual equivalence rather than a word for word translation. The first drafts of the Italian and Danish data sets were reviewed by another independent bilingual health professional, whose mother tongue was the target language and any discrepancies were discussed until a final consensus was reached. The translation process has followed the recommendations described by Biering-Sørensen et al.18

Statistical analysis

Cohen's kappa was computed for each categorical item as a measure of agreement between first and second test.21 Ordinal data were analyzed with weighted kappa statistics.

Responses to some items in the data sets were not exclusive and it was necessary to split these into dichotomous questions for calculation of kappa statistics. Thus, the total number of questions found in the results (Tables 1 and 2) is higher than the total number of items in the data sets. The interpretation of kappa is as follows: <0.2, poor; 0.21–0.4 fair; 0.41–0.6 moderate; 0.61–0.8 good; and 0.81–1.0 very good agreement.21 An inter-rater agreement >0.20 was considered as acceptable.

Table 1 Inter-rater reliability of items in the International Spinal Cord Injury Bowel Function Basic Data Set
Table 2 Inter-rater reliability of items in the International Spinal Cord Injury Bowel Function Extended Data Set

Continuous data in the International SCI Bowel Function Extended Data Set were analyzed by means of the coefficient of variation (numerical difference/mean). The percentage of agreement between first and second tests was computed as a supplement to kappa statistics.

Furthermore, the coefficient of variation was calculated as a measure of the inter-rater reliability of the Cleveland Constipation Score, the NBD score and the Wexner Fecal Incontinence Score. The differences among the scores at the two tests were plotted against the average of the scores at the two tests. Limits of agreement were computed to define the limits within which 95% of the differences are expected to fall.22

The International SCI Bowel Function Data Sets lack a single question included in the NBD Score (uneasiness, headache or perspiration during defecation), and therefore the NBD score is computed solely on the remaining nine items found in the International SCI Bowel Function Data Sets. The response categories of the Cleveland Constipation Score was interpreted as described by Jorge et al.16

All statistical analyses were carried out with Stata/IC10 software (STATACORP LP, College Station, TX, USA).

Results

Overall, 79 first and 73 second tests were performed. Of the six patients not fulfilling the second test, three were excluded because of changes in bowel function and three did not attend their second appointment. Thus, first and second tests were obtained from 73 SCI patients; 24 at the Italian, 24 at the English and 25 at the Danish center. Approximately 77% were men, median age at injury was 44 years (range 2–75) and median age at test was 49 years (range 20–81). The three most common causes of injury were transport-related activities (41%), falls (22%) and non-traumatic causes (27%). The distribution on the American Spinal Injury Association impairment scale at acute admission was as follows: A, 60%; B, 11%; C, 16%; D, 13% (n=63).

Median time between first and second test was 14 days (range 7–36).

Only the combined results of all three centers are presented; the number of subjects from each center is not sufficient to allow reliable analysis of inter-center differences.

Kappa coefficients for each question in the International SCI Bowel Function Basic and Extended Data Sets are displayed in Tables 1 and 2.

Inter-rater reliability assessed by kappa statistics was very good (0.81) in 5 items, good (0.61–0.80) in 11 items, moderate (0.41–0.60) in 20 items, fair (0.21–0.40) in 11 and poor (<0.20) in 5 items. The five questions that did not meet the lower limit of acceptable agreement were: ‘Defecation method and bowel care procedures-supplementary/mini enema’; ‘Medication affecting bowel function/constipating agents/other’; ‘Position for bowel care/other’; ‘Bowel care facilitators/other’ and ‘Lifestyle alteration due to constipation’.

In three questions, with dichotomous response, ‘yes’ was never selected in any test and, hence, no kappa coefficients could be computed. These questions were: ‘Surgical procedures on the gastrointestinal tract/ileostomy’; ‘Defecation method and bowel care procedures, supplementary/normal defecation’ and ‘sacral anterior root stimulation’.

In the basic data set, kappa could not be computed in nine dichotomous items because all agreements were placed in only one of the two diagonal boxes of the 2 × 2 table. Hence, the missing kappa value is not an expression of a low agreement, but is due to the non-computation of the kappa statistic. The percentage of agreement is displayed in the tables to supplement the kappa coefficients.

The three items with answers on a continuous scale ‘Events and intervals of defecation’ showed rather high coefficients of variation on 0.49, 0.46 and 0.56.

Cleveland Constipation Score, NBD Score and Wexner Fecal Incontinence Score were computed from data within the International SCI Bowel Function Data Sets. The inter-rater reliability of these scores is displayed by means of Bland–Altman plots with limits of agreement in Figures 1,2,3. The coefficients of variation of these three scores is displayed in Table 3.

Figure 1
figure 1

Cleveland Constipation Score: Bland–Altman plot with limits of agreement.

Figure 2
figure 2

Neurogenic Bowel Dysfunction Score: Bland–Altman plot with limits of agreement.

Figure 3
figure 3

Wexner Fecal Incontinence Score: Bland–Altman plot with limits of agreement.

Table 3 Coefficient of variation of three bowel function scores

Discussion

The International Spinal Cord Injury Data Sets have been developed to ensure a common international collection of relevant data on various aspects of SCI.20 When introducing a new instrument for measuring health, a comprehensive validation should be performed. The first step is usually to test reliability to determine whether the instrument is collecting data in a reproducible manner. Variations within the subject, within the rater, between raters or between different settings should be considered. Furthermore, the content validity should be explored to ensure that the selected items are relevant and able to describe the underlying concept in an exhaustive manner.22 The content validity of the International SCI Bowel Function Data Sets was established through the process of development, in which experts appointed by ISCoS and American Spinal Injury Association, on the basis of a literature search, discussed and reached consensus on which items should be included.11, 12 Relevant and interested scientific and professional international organizations and societies were invited to review the data sets and they were posted on the ISCoS and American Spinal Injury Association websites for 3 months to allow comments and suggestions.

The aim of the present study was to determine the inter-rater reliability of the International Spinal Cord Injury Bowel Function Basic and Extended Data Sets.11, 12 Inter-rater reliability was acceptable (kappa >0.2) in 47 of 52 items, in which kappa coefficients could be computed. We recommend that the five items showing ‘poor agreement’ (kappa <0.2) are revised and subsequently retested. If acceptable reliability is unobtainable in these items, their exclusion from the data sets should be considered.

Variation between first and second test is not due only to differences between raters. Intra-rater variation and variation within each patient also contribute. The intra-rater reliability of the data sets was not tested because an acceptable inter-rater reliability indicates that the intra-rater reliability is also acceptable.18, 23 Establishing high inter-rater reliability is the priority as the data sets will be used by many different raters in the future. We chose to separate the first and second test by a period of 2 weeks and it is possible that bowel function may have changed slightly in this period of time; this would affect the intra-subject variation. As we minimized this risk by excluding patients who had objective changes in bowel function and management during the period between tests, we consider this risk small.

We chose a kappa >0.20 as the minimum limit of acceptable agreement. Whether this limit should be higher is open for discussion. In general, a lower reliability must be expected when studying self-reported and partly subjective outcomes, as opposed to strictly objective outcomes. Furthermore, the data sets are not intended for making decisions on potentially high-risk treatment in which case a high reliability (kappa>0.8) is usually required.

Combining the basic and extended data set allows computation of the Wexner Fecal Incontinence Score,16 the Cleveland Constipation Scoring System17 and the NBD Score.18 The present study is the first to evaluate the inter-rater reliability of these scores in a population with SCI. Variation, expressed in terms of coefficient of variation, was surprisingly high, especially for the Wexner fecal incontinence score. We find that these existing scores for bowel function should be further evaluated in individuals with SCI.

The International SCI Bowel Function Data Sets allow a straightforward computation of the Wexner Fecal Incontinence Score. The wordings of some items in the two other scores are not completely equivalent with those of the items in the International SCI Bowel Function Data Sets, and a single item is missing to allow a complete computation of the NBD Score. Revision of the International SCI Bowel Function Data Sets is recommended to allow straightforward computation of all three bowel function scores.

In some data set items, the most extreme responses were rarely or never selected, and hence this evaluation of the data sets is limited to the responses used by the patients included. Our study population was too small to decide whether these response categories will be used in future studies and clinical practice. However, some of them are necessary as part of the abovementioned scores. Also, some of the items with dichotomous answers only used one. Unless the data sets perform differently in a larger population, exclusion of these items should be considered when revising them.

There were several common practical problems. For instance it was not explicitly stated on the data collection form whether response categories were exclusive or not. In a number of cases, when only one response was allowed, raters selected several response categories. In another example, the item ‘medication affecting bowel function/constipating agents’ in the basic data set caused confusion, because raters initially included laxatives. Confronted with the next item ‘oral laxatives’, it became clear to raters what the former item was asking. By reversing the order of the two items this problem could be avoided.

In addition, we recommend adding a response category ‘stoma’ to the question ‘position for bowel care’ in the extended data set, as the raters obviously needed this option. The guidelines posted on the ISCoS website were rarely used by the raters (3–4 times per rater); development of self-explanatory data sets that could easily be completed without the need for separate guidelines should be considered. Alternatively, development of patient-completed questionnaires, including the main part of the bowel function items, as produced by Jensen et al.24 for the pain data set, might improve the usefulness of the data sets and save precious time in clinical practice.

On the basis of our experiences in the present study, we recommend that the remaining SCI data sets are subjected to similar evaluation.

Conclusion

The International SCI Bowel Function Data Sets have shown an inter-rater agreement that was very good in 5 items, good in 11 items, moderate in 20 items, fair in 11 items and poor in 5 items. The data sets provide a reliable and useful tool in spinal cord injury research and clinical practice. Nevertheless the five items with poor agreement need to be revised.