Elsevier

Neurocomputing

Volume 285, 12 April 2018, Pages 1-9
Neurocomputing

Brief papers
Deep learning for classification of normal swallows in adults

https://doi.org/10.1016/j.neucom.2017.12.059Get rights and content

Abstract

Cervical auscultation is a method for assessing swallowing performance. However, its ability to serve as a classification tool for a practical clinical assessment method is not fully understood. In this study, we utilized neural network classification methods in the form of Deep Belief networks in order to classify swallows. We specifically utilized swallows that did not result in clinically significant aspiration and classified them on whether they originated from healthy subjects or unhealthy patients. Dual-axis swallowing vibrations from 1946 discrete swallows were recorded from 55 healthy and 53 unhealthy subjects. The Fourier transforms of both signals were used as inputs to the networks of various sizes. We found that single and multi-layer Deep Belief networks perform nearly identically when analyzing only a single vibration signal. However, multi-layered Deep Belief networks demonstrated approximately a 5–10% greater accuracy and sensitivity when both signals were analyzed concurrently, indicating that higher-order relationships between these vibrations are important for classification and assessment.

Introduction

Dysphagia is a term used to describe swallowing impairment [1]. It is seen as a symptom of many conditions, but most commonly occurs as a result of neurological conditions such as physical trauma or stroke [1], [2]. Though typically not an immediate threat to a patient’s well-being, dysphagia can quickly lead to more serious health complications including pneumonia, malnutrition, dehydration, and even death [2], [3]. The first attempt at identifying this condition in the clinic before these serious complications occur is a bedside assessment of the patient’s actions and behavior while swallowing. Should this prove inconclusive or is deemed insufficient by the administering clinician, more complex instrumental examinations are utilized. Nasopharyngeal flexible endoscopic evaluations involve visualization of the pharynx and upper airway during oral intake, while videofluoroscopic assessment collects dynamic radiographic images of the oral cavity, pharynx, upper airway and proximal esophagus, throughout the entire swallow event [1], [4]. The goal of these assessments is to determine the nature of swallowing pathophysiology, and determine appropriate methods of treatment more accurately than the current bedside assessments allow. However, both of these instrumental examinations require skilled expertise, specialized equipment, and a patient that is able to travel to the site of testing. Previous studies agree that an accurate, simple, non-invasive method of evaluating swallowing function would be a desirable addition to the available tools for assessment.

Multiple different swallowing screening tests have been investigated and implemented in the past. Non-instrumental methods, such as the 3 ounce water challenge [5], the Toronto bedside test [6], or the modified MASA [7] among others, have been widely implemented in the clinical setting. Though they generally have a high sensitivity for detecting aspiration, they have poor specificity and can lead to unnecessary interventions [5], [6], [7], [8]. Instrumental-based screening methods have also produced mixed results, but efforts have been made to improve these methods and allow for their use alongside existing screening techniques. Cervical auscultation, in particular, has been studied in significant detail in recent years [9]. Traditionally, this technique has utilized stethoscopes at the bedside to allow a clinician to listen to a patient swallow a bolus of liquid or food in real time. This non-instrumental screening method has not demonstrated adequate predictive value for swallowing disorders [3] but has given rise to a similar instrumental method in the form of digital microphones and accelerometers. In this digitized form any number of signal processing algorithms, such as those meant to filter noise or quantify statistical features, can be used to process the data [9]. The result is a signal that is much cleaner and easier to analyze accurately and consistently than the human-interpreted signals obtained through non-digital techniques [9].

While methods to classify the results of existing, qualitative swallowing tests are well established, the classification of digital swallowing signals has not been studied in as much detail. This has resulted in multiple studies that demonstrate promising preliminary results, but which still have key experimental deficiencies that call the generalizability of the methods into question. As one example, a recent study by Sarraf-Shirazi and Moussavi [10] sought to differentiate swallowing vibrations that originated from a swallow with no aspiration from swallowing vibrations that originated from swallows that did not result in aspiration. They gathered data from 10 individuals with dysphagia, identified the average spectral power of each swallowing signal over 3 key frequency bands, and used a fuzzy k-means classifier to classify each swallow [10]. Their results demonstrated slightly greater than 80% classification accuracy [10]. A study with similar goals by Nikjoo et al. [11] was published about the same time. This study also sought to differentiate between vibrations from swallows that did or did not result in aspiration, but they instead utilized a support-vector machine classifier with a selection of 8 statistical features as inputs, gathered from 30 participants with dysphagia [11]. They, too, achieved an overall classification accuracy slightly greater than 80% [11]. However, past studies were not limited to these specific aspirating/non-aspirating classes. Das et al. [12] sought to differentiate swallowing vibrations produced by healthy subjects from various artifact signals as well as differentiate dysphagic swallowing vibrations from similar artifact signals. They were able to achieve an overall accuracy of 97% for this task by using hybrid fuzzy logic committee neural networks with a limited selection of statistical features as inputs [12]. Conversely, Suryanarayanan et al. [13] attempted to use swallowing vibrations and pressure measurements in order to classify the severity of aspirated swallow on a 4-point scale. By using simple fuzzy logic, they were able to achieve an overall accuracy of slightly more than 80% on their 22 person data set. These four studies [10], [11], [12], [13] are not the sum of all research into classifying swallowing vibrations, but they are representative of the larger body of work and, by extension, demonstrate certain key flaws of past swallowing research. The first point, which some of these studies have addressed themselves, is that the number of subjects used to collect data was limited. In addition to being more susceptible to biases from individual subjects, this has also led to researchers not appropriately differentiating training and testing data sets. The similarly limited choice of input features is another significant drawback of previous research. By manually preselecting mathematical or physiological features to use as inputs to their classifier, researchers may have unintentionally biased their results or reduced the maximum potential accuracy of their classification method. Both the small sample sizes and manual feature selection limit the generalizability of the researchers’ findings and accentuate the need for greater refinement of swallowing classification methods.

Further investigating the literature related to swallowing classification reveals that studies that utilize neural network-based classifiers [12], [14], [15], [16], [17] tend to report higher overall classification rates. While the details of their methodologies vary, one trait these studies have in common is that nearly all of them apply user-selected input features of a mathematically complex nature. Lee et al. [15] explored this topic and finds that high-order features such as normality and dispersion ratio are only quadratically separable rather than linearly. Aboofazeli and Moussavi [17] further support the necessity of such high-level investigation of swallowing vibrations and demonstrate the benefits of both nonlinear analysis techniques and neural networks with multiple hidden layers. While the higher-order analysis of swallowing signals demonstrates clear benefits, these studies acknowledge that they are investigating a limited selection of mathematical signal descriptions and that alternate choices may offer benefits to classification. Such trends were also acknowledged by Makeyev et al. [16], who advocates the use of unsupervised learning combined with a highly redundant signal representation in order to avoid the biases of preselecting mathematical features. From these previous attempts at classifying swallowing vibrations, it is clear that the field would benefit from a technique that was both able to analyze higher-order signal features and could self-select features to analyze through use of unsupervised learning methods. One relatively new classification technique, deep learning, has not yet been used in swallowing research. However it does possess these desirable traits and could easily be implemented as a method of classifying swallows, thereby combining much of the past research on the topic into a single method.

In this study, we propose a method that allows for the differentiation of swallows made by a healthy subject and swallows that did not result in a significant amount of laryngeal penetration that were made by a dysphagic subject. This will be performed using only cervical auscultation signals that were recorded in a clinical environment during typical swallowing examination procedures. A previous study supports this possibility, as it asserts that these two events do produce significantly variable cervical auscultation signals [18]. We also propose that our chosen classification technique, a Deep Belief network, will provide more reliable classification than previously implemented techniques. Its ability to classify data in a non-linear manner based on higher order relationships than a simple, feed-forward neural network should allow for the best possible swallowing classification.

Section snippets

Participants

The protocol for the study was approved by the Institutional Review Board at the University of Pittsburgh. The data collected in this study has been utilized by other published studies [18], [19], [20] and the methods used to collect it have been published previously, but will be summarized here for convenience.

A total of 55 healthy participants (28 men, 27 women, mean age 39) were recruited from the neighborhoods surrounding the University of Pittsburgh campus. Each confirmed that they had no

Results

Tables 4 and 5 provide the results of our tests. Table 4 presents the results for networks which utilized a single vibration axis’ data as its input while Table 5 presents the corresponding results for our multi-modal networks. The number of correctly classified healthy and unhealthy swallows (of the 123 presented for each category) is given as the average of our ten-fold cross validation procedure. Sensitivity is defined as the percentage of swallows from patients with dysphagia that were

Discussion

Our study varied from past research on swallowing classification in a number of ways. Notably, we included a larger number of participants and swallow events as well as a much wider array of boluses and swallowing techniques. Despite this we see that our networks, particularly our single layer, single-modal networks, provide similar swallow classification accuracy to that reported by several other studies [13], [17], [37], [38], [39]. In addition, our networks demonstrated only a minimal amount

Conclusion

In this study, we sought to differentiate swallows made by healthy subjects from those made by patients with dysphagia using only cervical auscultation signals. To do this, we used the frequency spectrums of anterior–posterior and superior–inferior swallowing vibrations as inputs to a variety of single and multi-layer Deep Belief networks. We found that single layer networks provided the greatest overall accuracy when analyzing vibrations from a single axis. However, when incorporating

Acknowledgments

Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute Of Child Health and Human Development of the National Institutes of Health under Award number R01HD074819 while some data utilized in this study was gathered with the assistance of grant number UL1 TR000005. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Joshua M. Dudik received his Bachelor’s degree in biomedical engineering from Case Western Reserve University, OH in 2011 whereas he earned his Master’s in bioengineering from the University of Pittsburgh, PA in 2013. In 2015, he completed a Ph.D. degree in electrical engineering at the University of Pittsburgh, PA. His current work is focused on cervical auscultation and the use of signal processing techniques to assess the swallowing performance of impaired individuals.

References (39)

  • J.M. Dudik et al.

    Dysphagia screening: contributions of cervical auscultation signals and modern signal processing techniques

    IEEE Trans. Hum.-Mach. Syst.

    (2015)
  • S. Sarraf-Shirazi et al.

    Silent aspiration detection by breath and swallowing sound analysis

    Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society

    (2012)
  • M. Nikjoo et al.

    Automatic discrimination between safe and unsafe swallowing using a reputation-based classifier

    Biomed. Eng. OnLine

    (2011)
  • S. Palreddy et al.

    Neural networks in computer-aided diagnosis classification of dysphagic patients

    Proceedings of the 14th Annual International Conference of the IEEE Engineering in Medicine and Biology Society

    (1992)
  • LeeJ. et al.

    A radial basis classifier for the automatic detection of aspiration in children with dysphagia

    J. Neuroeng. Rehabil.

    (2006)
  • O. Makeyev et al.

    Recognition of Swallowing Sounds Using Time-Frequency Decomposition and Limited Receptive Area Neural Classifier

    (2009)
  • M. Aboofazeli et al.

    Analysis and classification of swallowing sounds using reconstructed phase space features

    Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing

    (2005)
  • J.M. Dudik et al.

    Dysphagia and its effects on swallowing sounds and vibrations in adults

    IEEE Trans. Neural Syst. Rehabil. Eng.

    (2016)
  • I. Jestrović et al.

    The effects of increased fluid viscosity on swallowing sounds in healthy adults

    Biomed. Eng. OnLine

    (2013)
  • Cited by (26)

    • Multi-task deep convolutional neural network for cancer diagnosis

      2019, Neurocomputing
      Citation Excerpt :

      Multi-task learning is more suitable for our study because we try to achieve satisfactory classification performance on each cancer dataset. With the big success of deep learning technique in image processing [40–44] and pattern recognition [45–48], more and more researchers incorporate multi-task learning and deep learning techniques together in computer vision [23–25,31,49,50] and bioinformatics [30,38,39,51] since these three years. In the computer vision field, Zhang et al. [30] proposed a tasks-constrained deep convolution network (TCDCN) model to jointly optimal facial landmark detection with multiple related tasks, e.g., head pose estimation task and facial attribute inference task.

    View all citing articles on Scopus

    Joshua M. Dudik received his Bachelor’s degree in biomedical engineering from Case Western Reserve University, OH in 2011 whereas he earned his Master’s in bioengineering from the University of Pittsburgh, PA in 2013. In 2015, he completed a Ph.D. degree in electrical engineering at the University of Pittsburgh, PA. His current work is focused on cervical auscultation and the use of signal processing techniques to assess the swallowing performance of impaired individuals.

    James L. Coyle received his Ph.D. in Rehabilitation Science from the University of Pittsburgh in 2008 with a focus in neuroscience. He is currently a Professor of Communication Sciences and Disorders in the School of Health and Rehabilitation Sciences (SHRS), and professor of Otolaryngology in the School of Medicine, University of Pittsburgh. He is Board Certified by the American Board of Swallowing and Swallowing Disorders and maintains an active clinical practice in the Department of Otolaryngology, Head and Neck Surgery and the Speech Language Pathology Service of the University of Pittsburgh Medical Center. He is a Fellow of the American Speech Language and Hearing Association.

    Amro El-Jaroudi received the B.S. degree in 1984, the M.S. degree in 1984, and the PhD. degree in 1988, all in electrical engineering from Northeastern University, Boston, MA, USA. Currently, he is an Associate Professor at the Department of Electrical and Computer Engineering at the University of Pittsburgh. His research interests include speech processing, spectral estimation, and digital signal processing algorithms.

    Zhi-Hong Mao received the dual B.S. degrees in automatic control and mathematics and the M.Eng. degree in intelligent control and pattern recognition from Tsinghua University, Beijing, China, in 1995 and 1998, respectively, the S.M. degree in aeronautics and astronautics from Massachusetts Institute of Technology, Cambridge, MA, USA, in 2000, and the Ph.D. degree in electrical and medical engineering from the Harvard-MIT Division of Health Sciences and Technology, Cambridge, in 2005. He joined the University of Pittsburgh, Pittsburgh, PA, USA, as an Assistant Professor in 2005 and became an Associate Professor of electrical engineering and bioengineering in 2011 and William Kepler Whiteford Faculty Fellow in 2012. His research interests include human-in-the-loop control systems, networked control, and neural control and learning. Dr. Mao received the Faculty Early Career Development (CAREER) Award of National Science Foundation and the Andrew P. Sage Best Transactions Paper Award of the IEEE Systems, Man and Cybernetics Society in 2010.

    Mingui Sun received the B.S. degree in instrumental and industrial automation from Shenyang Chemical Engineering Institute, Shenyang, China, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from the University of Pittsburgh, Pittsburgh, PA, USA, in 1986 and 1989, respectively. In 1991, he joined the University of Pittsburgh, where he is currently a Professor of neurosurgery, electrical and computer engineering, and bioengineering. He is also the Director of Research at Computational Diagnostics, Inc., Pittsburgh. His current research interests include advanced biomedical electronic devices, biomedical signal and image processing, sensors and transducers, biomedical instruments, artificial neural networks, wavelet transforms, time-frequency analysis, and the inverse problem of neurophysiological signals. He has authored or coauthored more than 350 publications. Dr. Sun is an elected Fellow of the American Institute for Medical and Biological Engineering. He received the Novel Smart Engineering System Design Award at the International Conference on Artificial Neural Networks in Engineering in 1999, the Best Paper Award at the International Symposium on Uncertainty Modeling and Analysis in 2003, the Chancellor’s Innovation Award of the University of Pittsburgh in 2007, 2008, 2010, and 2011, respectively, and the Distinguished Lectureship of the IEEE Circuits and Systems Society in 2012.

    Ervin Sejdić received the B.E.Sc. and Ph.D. degrees in electrical engineering from the University of Western Ontario, London, ON, Canada, in 2002 and 2008, respectively. He was a Postdoctoral Fellow at Holland Bloorview Kids Rehabilitation Hospital/University of Toronto and a Research Fellow in Medicine at Beth Israel Deaconess Medical Center/Harvard Medical School. He is currently an Associate Professor at the Department of Electrical and Computer Engineering (Swanson School of Engineering), the Department of Bioengineering (Swanson School of Engineering), the Department of Biomedical Informatics (School of Medicine) and the Intelligent Systems Program (School of Computing and Information) at the University of Pittsburgh, PA. His research interests include biomedical and theoretical signal processing, swallowing difficulties, gait and balance, assistive technologies, rehabilitation engineering, anticipatory medical devices, and advanced information systems in medicine. Dr. Sejdić received prestigious research scholarships from the Natural Sciences and Engineering Research Council of Canada in 2003 and 2005. He also received the Melvin First Young Investigator’s Award from the Institute for Aging Research at Hebrew Senior Life in Boston, MA. In 2016, President Obama named Prof. Sejdić as a recipient of the Presidential Early Career Award for Scientists and Engineers, the highest honor bestowed by the United States Government on science and engineering professionals in the early stages of their independent research careers. In 2017, Prof. Sejdić was awarded the National Science Foundation CAREER Award, which is the National Science Foundation's most prestigious award in support of career-development activities of those scholars who most effectively integrate research and education within the context of the mission of their organization.

    View full text