Background
In the clinical domain, phenotypic abnormalities are defined as alterations in normal morphology (structural abnormalities such as
cerebellar atrophy), physiology (functional abnormalities such as
incoordination of movement), or behavior (such as
difficulty in social interactions) [
1]. Acquiring a better understanding of the phenotypic abnormalities associated with rare genetic diseases is crucial to improve the interpretation of the genetic tests, and the translation of genomic information into clinical practice [
2]. Unlike genomic technology, collecting and analyzing phenotype data is not usually conducted following a standardized process. In clinical research, the process of collecting data ranges from determining the set of data to be gathered to decide the most appropriate computational representation. In general, this is a difficult, laborious and time-consuming task [
3]. Phenotype annotation has a huge potential to automatically extract data from large amounts of existing patient records or controlled trials. Recently, substantial progress has been achieved in encoding phenotypes using the Human Phenotype Ontology (HPO) [
4]. This ontology supplies a standardized core of human phenotypic abnormalities and the relationships between them. It is accessible online [
5] and contains over 12,000 classes and 16,000 hierarchical relationships.
Electronic rating scales represent an important resource for standardized data collection, often providing primary and secondary outcome measures. While rating scales are used in all medical disciplines, they are especially relevant in specialties with a richness of complex phenotypic variables, such as neurology [
6]. Clinical scales can measure the so-called latent variables, i.e., variables that can only be assessed indirectly through their manifestations. Examples of latent variables (or clinical dimensions) in neurological diseases include the quality and intensity of a tremor, the degree of gait imbalance or cognitive performance. These latent variables are assessed through a set of clinical questions (named statements or items) [
7]. Each statement may have multiple ordered response options, for which an ordinal number (score) is assigned. The total score for the global clinical dimension is usually obtained adding up all individual scores for each statement. Well-known examples are the Mini-Mental State Examination [
8], a 30-point survey used to measure cognitive impairment, or the Glasgow Coma Scale [
9] to assess coma and impairment of consciousness. These instruments improve data quality by reducing subjectivity in phenotype descriptions, and by simplifying the design of data collection protocols in research studies. Usually, rating scales grade several clinical dimensions through different items. For instance, in addition to the movement disorder (i.e., disease state), the Unified Parkinson’s Disease Rating Scale assesses other clinical sub-dimensions (such as mental state, complications of treatment and activities of daily life) via 42 questions that provide a total score [
10]. Reducing all content of a rating scale to a unique number (total score) may lead to loss of useful clinical information about the dimensions implicitly collected by the scale. Hence, inferring the patient’s phenotype components from the sub-scores would facilitate medical evaluation, report writing, and clinical decision-making. In this work, we chose to address the Scale for the Assessment and Rating of Ataxia (SARA) [
11], a well-validated instrument to evaluate the presence and severity of cerebellar ataxia [
12]. This scale is broadly used and has been applied by our group in a research on the spinocerebellar ataxia type 36 (SCA 36). A use case for the SARA scale is illustrated below.
Formal description of rating scales using standard clinical information models promotes computational data standardization, which is crucial for comparing results across studies [
13], to integrate information among different applications and medical records, and to implement decision support systems. International consortia have developed standardized data models for clinical research and electronic health records, such as ISO 13606 [
14], HL7 CDA [
15], openEHR [
16], NINDS CDE [
3] and Intermountain Healthcare [
17]. These efforts have focused on providing computable and formal specifications of clinical content, known as clinical models or archetypes. Such models also provide mechanisms to link the clinical statements to classes of some standard terminology or ontology. Both clinical archetypes and ontologies seek to structure patient information, according to the needs of clinical research and practice; however their perspectives are different. Archetypes model the information to mirror patient records. For example, the items
paraparesis and
facial palsy were recorded together into the archetype
Stroke Scale Neurological Assessment, which is available on the Clinical Knowledge Manager (CKM) provided by the OpenEHR Foundation
. Ontologies, on the other hand, aim to represent the meaning and relationships between clinical terms. For example, both
paraparesis and
facial palsy are represented as abnormalities of the nervous system in the HPO. However, the former is represented as an abnormality of the physiology, whereas the second one as an abnormality of the morphology. This ontological distinction cannot be reflected into the clinical archetype and however it is valuable to interpret the patient status. Thus, integrating ontologies with clinical archetypes would not only provide a static knowledge store, but also a dynamic resource to automatically infer patient phenotype and standardize data collection.
Modeling rating scales by clinical archetypes
Braun et al. [
18] modeled a rating scale for the assessment of multiple sclerosis patients, following a standard archetype development approach. On this basis, ontology mapping was delayed until the final stages of modeling. At this late stage, the effort required to map archetype terms to ontology entities is substantial [
19], due in part to the large size of the ontologies [
20]. Furthermore, designing clinical archetypes separately from ontologies may lead to major discrepancies in the meaning of the clinical statements. As a result, ontology mappings – key to achieve semantic interoperability among different data sources - are not common in the openly accessible archetypes.
Exploiting reasoning on clinical archetypes
Exploiting reasoning on clinical archetypes is another challenge. The Guideline Definition Language (GDL) is a formalism to express decision support logic by a rule-based declarative strategy. Some researchers found GDL reliable for guideline compliance in acute stroke and chronic kidney disease [
21,
22]. However, GDL does not provide much support for ontologies and related reasoning. An alternative is to transform clinical archetypes/models into OWL-DL (Ontology Web Language – Description Language) [
23‐
26]. Following this approach, ontology reasoners, such as Pellet, Hermit or Fact++, can be used to check the OWL-based archetype consistency [
24], and make automated reasoning over the OWL dataset. This structure provides support for interoperability of rule-based mechanisms [
24,
27,
28]. However, having two independent versions of the same standard model - one in the language of the model itself (ADL-Archetype Definition Language) and the other in OWL - makes maintenance more difficult. Furthermore, procedural knowledge as the sum of the scores in a scale (or any complex calculation function) cannot be simply represented in terms of OWL. An interesting alternative proposed by Mugzach et al. [
28] to perform a particular counting in OWL (named
k-of-N counting by the authors) was to develop a plug-in meeting the specific requirements. However, different calculation functions would then require implementing specific plug-ins. Other researchers defined knowledge-intensive mappings from the data sources to openEHR archetypes [
29,
30]. They distinguished between data-level and knowledge-level processing tasks. The former included calculation functions specified in the mappings and directly run on archetype data. The latter covered classification tasks defined using OWL classes with sufficient conditions. An integrated Personal Health Record is an alternative option to simplify data integration and clinical decision-making [
31].
Scale for the assessment and rating of ataxia
The SARA assesses severity of cerebellar dysfunction through the evaluation of eight items reflecting motor performance (gait, stance, sitting, speech disturbance, finger-chase test, nose-finger test, fast alternating hand movements and heel-shin test) [
32]. For the last four items, upper and lower extremities are evaluated bilaterally, the mean values of both sides are calculated and added up to the scores of the first four items. The total score ranges from 0 (no ataxia) to 40 (most severe ataxia). The SARA is used in clinical studies for a more accurate evaluation of the patient’s motor performance, both globally and at the individual items. It is also useful for quantitative comparison of patients, ataxia types, disease stages and response to treatment, among other applications. One of the challenges of the SARA is the need to derive a qualitative description and patient classification with diagnostic implications from numerical scores. An automated system to solve this translation would greatly facilitate the use of the SARA – and, by extension, other scoring systems- by clinicians on both research and clinical routine settings.
A use case description
Let’s consider patients 1, 2 and 3, with the same total SARA score of 20 (Table
1). Just based on the total score, the three patients would be considered to be in similar clinical stage. However, their functional situation is notably different. While patient 1 scores very high for midline ataxia (which is concluded from the high values of the three first items) and can barely walk or sit unaided, patients 2 and 3 have compromised speech (item number 4) and limb coordination (from the values of the last four items). In turn, patients 2 and 3 – with similar sitting, standing and walking performance – have different degree of speech impairment (mild in patient 2, while verbal communication is impossible for patient 3). The total – and even just partial scores for limbs – also do not help differentiate the actual phenotype of patients 2 and 3, who have very different performance with their limbs (appendicular ataxia derived from the last four items). While patient 2 has significantly impaired motor coordination on both sides, patient 3 has a more asymmetrical cerebellar syndrome, with severe impairment on his left side, but only very mild involvement of his right side, which may be of enormous relevance to his functional ability if the patient is right-handed.
Table 1
An example scenario
1 | 6 | 6 | 3 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | |
Mean R-L | | | | | 0 | 1 | 1 | 2 | 20 |
2 | 2 | 2 | 1 | 3 | 3 | 4 | 3 | 4 | 3 | 3 | 2 | 2 | |
Mean R-L | | | | | 3.5 | 3.5 | 3 | 2 | 20 |
3 | 2 | 2 | 1 | 6 | 1 | 4 | 1 | 4 | 1 | 4 | 0 | 3 | |
Mean R-L | | | | | 2.5 | 2.5 | 2.5 | 1.5 | 20 |
Specific contribution
The main goal of our work was to develop an electronic rating scale in the clinical domain of the neurology representing both the content and the interpretation of the SARA, using the Electronic Health Record (EHR) standards and taking advantage of semantic web technologies. In order to facilitate ontology mapping and prevent large semantic discrepancies between clinical archetypes and ontologies, we propose a novel method based on the assumption that archetype design should be supported by ontologies in those clinical situations where the archetype contents are logically organized. On the other hand, the interpretation of the SARA required different types of information: data to be recorded (i.e., the content of the rating scale), knowledge about the meaning of terms in the scale (i.e., terminological knowledge), procedural knowledge to understand the meaning of scores (that can be easily expressed by guidelines), and ontological knowledge to deduce patient phenotypes. We chose to use a combination of GDL, openEHR clinical archetypes and ontologies to address the challenges of modeling the rating scale. The research questions addressed in this work are: (1) is the combination of GDL, openEHR clinical archetypes and ontologies suitable for the description of all knowledge covered by the SARA?, and (2) is it possible to achieve a reasonable integration of these technologies to efficiently model the rating scale?
Discussion
In this paper, a mixed method to support the development of the SARA has been presented. The method combined OpenEHR archetypes, guidelines, ontologies and reasoning. The innovation of our method rests on how these approaches were combined to get the full benefit of them. We distinguished between the modeling phase and the implementation phase. During the former, we addressed the calculation and assessment tasks by defining and executing GDL rules, and the clinical synopsis task by defining OWL classes and executing a reasoner. However, due to the lack of integration between these frameworks, we first ran the GDL framework, and then we manually entered the results in Protégé in order to infer the phenotypic abnormalities. During the implementation phase, we addressed the calculation and assessment tasks by rewriting the rules directly in Java, and the clinical synopsis task by integrating the OWL API into the system and using the mappings to create OWL individuals. We designed the approach as an archetype-based stand-alone application, providing a meaningful way for collecting and interpreting healthcare data. The application released the local EHR system of integrating the SARA, providing a standard way of delivering the collected and inferred data. Thus, the main role of this electronic rating scale was to collect the normalized data, execute the decision support logic and deliver both data and interpretations to the EHR system.
Turning to the research questions in this paper, a few conclusions can be drawn. With respect to question (1) - Is the combination of GDL, openEHR clinical archetypes and ontologies suitable for the description of all knowledge covered by the SARA?) -, we can conclude that a combination of OpenEHR, GDL and OWL offers a suitable framework for the purpose of describing the data and knowledge levels of the SARA. OpenEHR provides a formal specification at the data level, whereas GDL and ontologies offer formal specifications of different types of knowledge for data interpretation purpose. However, it should be emphasized that in our particular case, the knowledge level could be broken into separate information-processing units interconnected in a simple way through the two defined archetypes. However, the interpretation of a rating scale may require more complex control mechanisms, demanding more interoperability between GDL and OWL. Furthermore, the current version of GDL uses archetype data as input and output variables for all the rules, but it provides no facility to define auxiliary variables. This type of variables is sometimes necessary to model procedural knowledge, such as the counting of the scores in rating scales. At the moment, there are two solutions: 1) adding new elements to the archetype, or 2) defining an auxiliary archetype containing all the needed variables. The advantage of this second option is that leaves intact the original archetype.
The new major version of ADL includes specifications for defining explicit rules of invariant assertions, i.e., expressions that should be satisfied by all instances of an archetype. These assertions cover some calculation functions over one or several items, and also definitions of mandatory items in the presence of specific values of other items. The definition of these rules provides the same functionality as some of the GDL rules defined in our system. However, the syntax of these specifications is not stable and it is still a need of tools that offer support for the automatic handling of invariants on archetype instances. Similarly, the new ADL specification covers a section for mapping to external terminologies, which has been improved with richer mappings. Specifically, it is possible to map post-coordinated archetype codes to ontology pre-coordinated classes. This facility can be very relevant when ontology mapping is carried out in the final stages of the modeling process.
With respect to question (2) - Is it possible to achieve a reasonable integration of these technologies to efficiently model the rating scale? -, we showed that a full integration of these technologies to model the rating scale is not possible at the moment. In the modeling stage, the use of GDL facilitated the development and interconnection of most processing units, without resorting to external resources and encouraging knowledge sharing. We could verify and validate the SARA by testing use cases in the GDL editor. We should bear in mind that this tool automatically generates entry forms based on the defined archetypes. The forms are used to collect data from the user, run the engine and display the outcomes. However, the editor does not supply any other facility for delivering the outcomes. For example, the generation of XML instances of the archetypes would be a remarkable advance to provide the option of combining the tool with other different inference engines, such as description logic reasoners. Regarding to ontology reasoning, testing use cases based on archetypes and OWL requires tools that automate the process of converting archetype instances to OWL individuals, run the reasoner on the OWL dataset, and deliver the outcomes as instances of the archetypes.
In the implementation stage, the used version of GDL did not provide any utility to translate the modeled rules into some execution engine (e.g., drools or clips), as has been mentioned previously. Thus, this part of the implementation required substantial effort. In order to decrease the time devoted to implementation, we parsed the ADL archetypes and developed a database model based on the archetype structure. Regarding to ontology reasoning, the archetype mappings facilitated the translation of the archetype instances into OWL individuals.
The suggested approach not only focuses on the syntactic structure of the SARA, but also on leveraging a scaled-down version of the HPO from the earliest stages of the modeling of archetypes. This ontology version was a valuable resource to facilitate 1) the syntactic structure of the rating scale, 2) the terminology mapping, 3) the automated interpretation of collected data, and 4) the communication process among the information-processing units. Regarding the first point, we organized the SARA items by means of a tree structure (Fig.
5b), using the CLUSTERS class provided by OpenEHR. As mentioned above, this new organization preserved the 8-item performance of the original scale. It also differentiated the three main clinical dimensions of the SARA, although these were not assessed quantitatively. Following the OpenEHR documentation, the CLUSTERS class is provided to represent common domain patterns required in many clinical scenarios. The clinical dimensions identified into the SARA can be viewed as common domain patterns that provide a more accurate assessment of the patient’s phenotype components, clarifying the interpretation of the results. However, the observation archetype that was uploaded to the CKM, where is publicly accessible, follows the flat structure of the original rating scale. As the main goal of the CKM is to provide high-quality information models, the CKM consortium considered that a flat structure that complied with the original scale structure was more convenient. However, we think the approach presented here remains valid, as usually rating scales grade several clinical dimensions [
7] and the proposed structure using CLUSTERS classes allows the proper representation of these dimensions. On the other hand, the evaluation archetype was not uploaded to the CKM, as only those archetypes that are based on some documented international assessment or very generic requirement are accepted. Following the CKM recommendations, the SARA evaluation archetype is perfectly suitable for local use.
Regarding the second point, mappings to standard vocabularies are uncommon in the clinical archetypes that are published in openly accessible repositories. In general, terminologies include a huge number of clinical terms; so manual mapping turns out to be unfeasible in practice. The extraction of the scaled-down version of the HPO provided us a means of performing terminology mapping in the earliest stages of archetype building. Just as for the clinical archetype, some parts (i.e., classes and relationships) of the scaled-down version of the HPO were reorganized to cover the SARA domain required for the ontology-driven modeling. This approach, known as ontology reuse, is an important design principle in ontologies [
44,
45] that facilitates the development of specific applications.
Regarding the third point, the ontology version provided the knowledge required to infer patient phenotypic information from the data collection. For example, from the score 8 of the item
gait, the system inferred that the patient had
abasia, and so
gait ataxia. However, exploiting reasoning on both ADL and ontologies is not possible at the moment. In our approach, this reasoning was needed to interpret the presence of the phenotypic abnormalities associated to the clinical dimensions of the scale. As mentioned early, a critical success factor for exploiting reasoning is the availability of ontology-based reasoning tools that use data expressed in ADL format and with capabilities to fire GDL rules. Such an integrated editor would assist with the effort at the authoring level. On the other hand, following the approaches developed in [
24,
27,
28], we will transform the clinical archetypes into OWL-DL and use the ontology and rule-based mechanisms provided by Protégé to draw interpretations on data collection, with the goal of comparing the results with the ones achieved the approach developed in this work.
With regard to the interpretation of the results of our application, Table
5 reflects a very high degree of agreement between the system and the two neurologists, confirming that the approach can be a good solution to develop electronic rating scales. Even so, these excellent results should also be viewed with much caution, as the validation was carried out only with 28 patient data, all of them affected by the same rare disease (SCA36). Additionally, although the two neurologists who carried out the assessment were independent, they work in the same hospital and one of them is in the same research group as MJS, the neurologist involved in the modeling process. It therefore has to be assumed that there exists consistency between the three neurologists. Therefore, in our future work, we will evaluate the application with a larger number of patient data that are affected by diverse cerebellar ataxias, and with the help of neurologists from different hospitals. If the results are still highly satisfactory, we will develop a simple mobile application for the automatic transmission of the interpretation to the health information system. In contrast to the SMS, which was developed as a local version, the mobile application will be available for download.
Finally, although our approach was designed to implement a prototype for managing the SARA, it is rather generic and hence applicable to model other electronic rating scales, possibly in other clinical domains. To take an example, the approach could be applied to the domain of the autism spectrum disorders, which exhibit complex phenotypes affecting variables that are difficult to measure. As a consequence, standardized scales are often used to collect a large amount of phenotypic data. Recently, a phenotype ontology has been developed to identify behavioral features of importance [
46]. The availability of this ontology and also the mappings to the rating scales would facilitate the implementation of prototypes like the one presented here.