Background
The need to identify pain in patients with dementia
The human observer of pain as substitute of the self-report of pain: promises and challenges
Promises
Challenges
Method/design
A road map to find a better solution by use of a new interdisciplinary developed video-based system
Step of choosing the right assessment criteria
Step of selecting the appropriate training material
Technical steps
-
Step of robust face capturing (black colored boxes of Fig. 1)We use the “Sophisticated High Speed Object Recognition Engine” (SHORE, [40, 41]) developed by Fraunhofer IIS for detection of faces. Frontal face detection rate of the SHORE system is 91.5% with 10 false positives when tested on the public CMU + MIT data set (http://vasc.ri.cmu.edu/idb/html/face/frontal_images/index.html). This data set contains 507 annotated faces in 130 grayscale images. The face detection is based on local census and structure features. For classification, a classifier cascade is used (for more details see [40]) and together with a coarse-to-fine grid search this leads to an efficient real-time face detector. SHORE is also able to detect four basic emotions (anger, happiness, surprise, sadness) as well as valence (hedonic tone of the feeling (positive vs. negative)) [42]. Within our framework the SHORE system is used to locate the person’s face as well as the position of eyes, nose and mouth corners in each image of the video stream. The face is then normalized with respect to rotation and scaling. Thus, the normalized image always has the same resolution and pose. In this way, at least some of the variations in the appearance of the face that are caused by head rotations and movements of the person in front of the video capture device are mitigated, making this approach robust enough for capturing faces of bedridden patients. If more than one face is present, then the face detector selects the most prominent face on the basis of the face size in the image. If no face is detected, then the frame is not processed further.
-
Step of analyzing single facial motions (light grey colored boxes of Fig. 1)Automatic detection of pain and pain levels from facial expressions is generally performed as a single or two-level detection process. In the former case, image sequences are processed directly when sequences can be supposed to be indicative of pain to extract characteristic features (e.g. [24, 31, 35]). In the latter case, image sequences are first processed for detecting single facial motions and coding them in terms of FACS (namely as AUs). Then (in a second step), the detected AUs and their intensities are processed to determine the likelihood of the presence and intensity of pain according to some thumb rules based on the available literature [26, 34]).Color or grayscale image sequences are commonly used as input to pain detection systems [24, 26, 30, 35]. More recently, depth and thermal images are also being used in combination with color images [43]. Numerical features describing the geometric shape or textural appearance of the face are extracted from each image in a sequence. The shape and appearance features are often used in combination [30, 31]. To incorporate expression dynamics, features are extracted over multiple images within a certain time interval [35]. In the two-level pain detection process, temporal features are also extracted from AU intensities [26]. The extracted features are processed using various machine learning methods in order to detect pain.In contrast to other approaches, our AU detection implements a temporal state model that connects each frame to the next [44]. This leads to a logical connection between successive frames, and with this property the system is able to mitigate noise and effectuate a temporal smoothing of the output. It is worth noting that the visco-elastic properties of facial muscles are taken into account in our state model by an individual mass-spring-damper model per AU. For the detection of the intensity of AUs in each frame, two sources of information are used.1 The geometric displacement of key points of the face (e.g. mouth corners) and texture information (e.g. wrinkles) are fused within the framework to make a final decision on the intensities of a selected set of AUs. During this process, an internal model of the facial morphology of the person is also taken into account. This model of the person’s “neutral” face is determined over time and helps to calibrate the system to the person’s face at runtime automatically [44]. This online calibration is necessary because it is often not possible to acquire a neutral face on demand. So in comparison to other approaches, we do not rely on an explicit calibration phase using a static mean face as a neutral face, since we think that this is not precise enough and will cause problems in distinguishing subtle expression related changes in the face from calibration errors.
-
Step of applying knowledge-level diagnosis of pain (dark grey boxes of Fig. 1)Based on the identification of the temporal sequence of AUs and their intensities, a knowledge-level model can be built for diagnosis (see Fig. 1) – that is, the decision whether a patient experiences pain during the present video segment. Input in such a model is a pattern of AUs and output is the diagnosis. The diagnosis is performed by means of the application of symbolic rules, which represent patterns of AUs which are indicative for pain. Because the rules are represented symbolically, the diagnostic decision can be explained to a human observer.Diagnosis can be based either on prototypical, group-specific or individual patterns of AUs. Although a distinctive pain-indicative set of prototypical facial muscle movements has been identified that is displayed universally during pain [45], there are also substantial variations between individuals. We recently demonstrated that facial expressions of pain are best described as four distinct facial activity patterns of pain, shown reliably by certain groups of individuals, rather than as one single prototypical set of movements [46]. The most stable and most frequent patterns were ‘narrowed eyes’ combined with either (I) ‘wrinkled nose’ and ‘furrowed brows’; (II) ‘furrowed brows’ or (III) ‘opened mouth’ (the fourth pattern was not stable enough for further consideration). We could show that the most prominent facial movement which is part of each facial activity pattern, namely the ‘narrowed eyes’ encodes the sensory dimension of pain, whereas ‘wrinkled nose’ and ‘furrowed brows’ encode the affective dimension of pain [16]. Given these findings, the knowledge-based model will consider these three distinct facial activity patterns as well as consider whether a facial response might be indicative of pain intensity or pain affect in the diagnosis process. We hope that incorporating this knowledge in the automatic diagnosis process will improve sensitivity and specificity. By analogy, human observers benefit in their recognition aperformance from becoming aware of the presence of different facial activity patterns indicative of pain [47].The knowledge-based model is constructed either by classifier learning or by unsupervised learning (e.g. [48]). In the first case, a training set needs to include AU sequences observed for pain episodes as well as for non-pain episodes (e.g., disgust), the classifier is trained such that the rules have high sensitivity as well as high specificity for pain [49, 50]. In the second case, only pain episodes characteristic patterns are identified [50]. To exploit as much information as possible from the observed AU sequences, a rich representation language which also allows including domain specific knowledge as a background theory is helpful. Therefore, we currently investigate the application of inductive logic programming (ILP [51]) to learn diagnostic rules. In this framework, it is possible to learn rules which either only include information about the presence and possibly the intensity of specific AUs or rules which take into account information about sequences and simultaneous occurrence of AUs. A first empirical investigation indicates that human observers take sequential information into account [52]. At later stages of the process, we intend to use knowledge level diagnosis of pain that can be extended to sub-group classification learning. For example, knowing that facial expressiveness to pain is increased in patients with Alzheimer’s disease (AD) [14] and reduced in patients who are suffering from Parkinson Disease (PD) [20], we will at later stages apply sub-group classification learning (separately for patients with AD and PD), to possibly account for these pathological alterations.