Brief papersDeep learning for classification of normal swallows in adults
Introduction
Dysphagia is a term used to describe swallowing impairment [1]. It is seen as a symptom of many conditions, but most commonly occurs as a result of neurological conditions such as physical trauma or stroke [1], [2]. Though typically not an immediate threat to a patient’s well-being, dysphagia can quickly lead to more serious health complications including pneumonia, malnutrition, dehydration, and even death [2], [3]. The first attempt at identifying this condition in the clinic before these serious complications occur is a bedside assessment of the patient’s actions and behavior while swallowing. Should this prove inconclusive or is deemed insufficient by the administering clinician, more complex instrumental examinations are utilized. Nasopharyngeal flexible endoscopic evaluations involve visualization of the pharynx and upper airway during oral intake, while videofluoroscopic assessment collects dynamic radiographic images of the oral cavity, pharynx, upper airway and proximal esophagus, throughout the entire swallow event [1], [4]. The goal of these assessments is to determine the nature of swallowing pathophysiology, and determine appropriate methods of treatment more accurately than the current bedside assessments allow. However, both of these instrumental examinations require skilled expertise, specialized equipment, and a patient that is able to travel to the site of testing. Previous studies agree that an accurate, simple, non-invasive method of evaluating swallowing function would be a desirable addition to the available tools for assessment.
Multiple different swallowing screening tests have been investigated and implemented in the past. Non-instrumental methods, such as the 3 ounce water challenge [5], the Toronto bedside test [6], or the modified MASA [7] among others, have been widely implemented in the clinical setting. Though they generally have a high sensitivity for detecting aspiration, they have poor specificity and can lead to unnecessary interventions [5], [6], [7], [8]. Instrumental-based screening methods have also produced mixed results, but efforts have been made to improve these methods and allow for their use alongside existing screening techniques. Cervical auscultation, in particular, has been studied in significant detail in recent years [9]. Traditionally, this technique has utilized stethoscopes at the bedside to allow a clinician to listen to a patient swallow a bolus of liquid or food in real time. This non-instrumental screening method has not demonstrated adequate predictive value for swallowing disorders [3] but has given rise to a similar instrumental method in the form of digital microphones and accelerometers. In this digitized form any number of signal processing algorithms, such as those meant to filter noise or quantify statistical features, can be used to process the data [9]. The result is a signal that is much cleaner and easier to analyze accurately and consistently than the human-interpreted signals obtained through non-digital techniques [9].
While methods to classify the results of existing, qualitative swallowing tests are well established, the classification of digital swallowing signals has not been studied in as much detail. This has resulted in multiple studies that demonstrate promising preliminary results, but which still have key experimental deficiencies that call the generalizability of the methods into question. As one example, a recent study by Sarraf-Shirazi and Moussavi [10] sought to differentiate swallowing vibrations that originated from a swallow with no aspiration from swallowing vibrations that originated from swallows that did not result in aspiration. They gathered data from 10 individuals with dysphagia, identified the average spectral power of each swallowing signal over 3 key frequency bands, and used a fuzzy k-means classifier to classify each swallow [10]. Their results demonstrated slightly greater than 80% classification accuracy [10]. A study with similar goals by Nikjoo et al. [11] was published about the same time. This study also sought to differentiate between vibrations from swallows that did or did not result in aspiration, but they instead utilized a support-vector machine classifier with a selection of 8 statistical features as inputs, gathered from 30 participants with dysphagia [11]. They, too, achieved an overall classification accuracy slightly greater than 80% [11]. However, past studies were not limited to these specific aspirating/non-aspirating classes. Das et al. [12] sought to differentiate swallowing vibrations produced by healthy subjects from various artifact signals as well as differentiate dysphagic swallowing vibrations from similar artifact signals. They were able to achieve an overall accuracy of 97% for this task by using hybrid fuzzy logic committee neural networks with a limited selection of statistical features as inputs [12]. Conversely, Suryanarayanan et al. [13] attempted to use swallowing vibrations and pressure measurements in order to classify the severity of aspirated swallow on a 4-point scale. By using simple fuzzy logic, they were able to achieve an overall accuracy of slightly more than 80% on their 22 person data set. These four studies [10], [11], [12], [13] are not the sum of all research into classifying swallowing vibrations, but they are representative of the larger body of work and, by extension, demonstrate certain key flaws of past swallowing research. The first point, which some of these studies have addressed themselves, is that the number of subjects used to collect data was limited. In addition to being more susceptible to biases from individual subjects, this has also led to researchers not appropriately differentiating training and testing data sets. The similarly limited choice of input features is another significant drawback of previous research. By manually preselecting mathematical or physiological features to use as inputs to their classifier, researchers may have unintentionally biased their results or reduced the maximum potential accuracy of their classification method. Both the small sample sizes and manual feature selection limit the generalizability of the researchers’ findings and accentuate the need for greater refinement of swallowing classification methods.
Further investigating the literature related to swallowing classification reveals that studies that utilize neural network-based classifiers [12], [14], [15], [16], [17] tend to report higher overall classification rates. While the details of their methodologies vary, one trait these studies have in common is that nearly all of them apply user-selected input features of a mathematically complex nature. Lee et al. [15] explored this topic and finds that high-order features such as normality and dispersion ratio are only quadratically separable rather than linearly. Aboofazeli and Moussavi [17] further support the necessity of such high-level investigation of swallowing vibrations and demonstrate the benefits of both nonlinear analysis techniques and neural networks with multiple hidden layers. While the higher-order analysis of swallowing signals demonstrates clear benefits, these studies acknowledge that they are investigating a limited selection of mathematical signal descriptions and that alternate choices may offer benefits to classification. Such trends were also acknowledged by Makeyev et al. [16], who advocates the use of unsupervised learning combined with a highly redundant signal representation in order to avoid the biases of preselecting mathematical features. From these previous attempts at classifying swallowing vibrations, it is clear that the field would benefit from a technique that was both able to analyze higher-order signal features and could self-select features to analyze through use of unsupervised learning methods. One relatively new classification technique, deep learning, has not yet been used in swallowing research. However it does possess these desirable traits and could easily be implemented as a method of classifying swallows, thereby combining much of the past research on the topic into a single method.
In this study, we propose a method that allows for the differentiation of swallows made by a healthy subject and swallows that did not result in a significant amount of laryngeal penetration that were made by a dysphagic subject. This will be performed using only cervical auscultation signals that were recorded in a clinical environment during typical swallowing examination procedures. A previous study supports this possibility, as it asserts that these two events do produce significantly variable cervical auscultation signals [18]. We also propose that our chosen classification technique, a Deep Belief network, will provide more reliable classification than previously implemented techniques. Its ability to classify data in a non-linear manner based on higher order relationships than a simple, feed-forward neural network should allow for the best possible swallowing classification.
Section snippets
Participants
The protocol for the study was approved by the Institutional Review Board at the University of Pittsburgh. The data collected in this study has been utilized by other published studies [18], [19], [20] and the methods used to collect it have been published previously, but will be summarized here for convenience.
A total of 55 healthy participants (28 men, 27 women, mean age 39) were recruited from the neighborhoods surrounding the University of Pittsburgh campus. Each confirmed that they had no
Results
Tables 4 and 5 provide the results of our tests. Table 4 presents the results for networks which utilized a single vibration axis’ data as its input while Table 5 presents the corresponding results for our multi-modal networks. The number of correctly classified healthy and unhealthy swallows (of the 123 presented for each category) is given as the average of our ten-fold cross validation procedure. Sensitivity is defined as the percentage of swallows from patients with dysphagia that were
Discussion
Our study varied from past research on swallowing classification in a number of ways. Notably, we included a larger number of participants and swallow events as well as a much wider array of boluses and swallowing techniques. Despite this we see that our networks, particularly our single layer, single-modal networks, provide similar swallow classification accuracy to that reported by several other studies [13], [17], [37], [38], [39]. In addition, our networks demonstrated only a minimal amount
Conclusion
In this study, we sought to differentiate swallows made by healthy subjects from those made by patients with dysphagia using only cervical auscultation signals. To do this, we used the frequency spectrums of anterior–posterior and superior–inferior swallowing vibrations as inputs to a variety of single and multi-layer Deep Belief networks. We found that single layer networks provided the greatest overall accuracy when analyzing vibrations from a single axis. However, when incorporating
Acknowledgments
Research reported in this publication was supported by the Eunice Kennedy Shriver National Institute Of Child Health and Human Development of the National Institutes of Health under Award number R01HD074819 while some data utilized in this study was gathered with the assistance of grant number UL1 TR000005. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Joshua M. Dudik received his Bachelor’s degree in biomedical engineering from Case Western Reserve University, OH in 2011 whereas he earned his Master’s in bioengineering from the University of Pittsburgh, PA in 2013. In 2015, he completed a Ph.D. degree in electrical engineering at the University of Pittsburgh, PA. His current work is focused on cervical auscultation and the use of signal processing techniques to assess the swallowing performance of impaired individuals.
References (39)
- et al.
Oropharyngeal dysphagia assessment and treatment efficacy: setting the record straight (response to Campbell–Taylor)
J. Am. Med. Dir. Assoc.
(2009) - et al.
Analysis of a physician tool for evaluating dysphagia on an inpatient stroke unit: the modified mann assessment of swallowing ability
J. Stroke Cerebrovasc. Dis.
(2010) - et al.
Hybrid fuzzy logic committee neural networks for recognition of swallow acceleration signals
Comput. Methods Programs Biomed.
(2001) - et al.
A fuzzy logic diagnosis system for classification of pharyngeal dysphagia
Int. J. Biomed. Comput.
(1995) Evaluating dysphagia
Am. Family Physician
(2000)- et al.
Complications and outcome after acute stroke. Does dysphagia matter?
Stroke
(1996) - et al.
Cervical auscultation synchronized with images from endoscopy swallow evaluations
Dysphagia
(2007) - et al.
Clinical utility of the 3-ounce water swallow test
Dysphagia
(2008) - et al.
The toronto bedside swallowing screening test (TOR-BSST): development and validation of a dysphagia screening tool for patients with stroke
Stroke
(2009) - et al.
Initiating safe oral feeding in critically ill intensive care and step-down unit patients based on passing a 3-ounce (90 milliliters) water swallow challenge
J. Trauma-Injury Infect. Crit. Care
(2011)
Dysphagia screening: contributions of cervical auscultation signals and modern signal processing techniques
IEEE Trans. Hum.-Mach. Syst.
Silent aspiration detection by breath and swallowing sound analysis
Proceedings of the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Automatic discrimination between safe and unsafe swallowing using a reputation-based classifier
Biomed. Eng. OnLine
Neural networks in computer-aided diagnosis classification of dysphagic patients
Proceedings of the 14th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
A radial basis classifier for the automatic detection of aspiration in children with dysphagia
J. Neuroeng. Rehabil.
Recognition of Swallowing Sounds Using Time-Frequency Decomposition and Limited Receptive Area Neural Classifier
Analysis and classification of swallowing sounds using reconstructed phase space features
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Dysphagia and its effects on swallowing sounds and vibrations in adults
IEEE Trans. Neural Syst. Rehabil. Eng.
The effects of increased fluid viscosity on swallowing sounds in healthy adults
Biomed. Eng. OnLine
Cited by (26)
Swallowing sound index analysis using electronic stethoscope and artificial intelligence for patients with Parkinson's disease
2023, Journal of the Neurological SciencesMulti-task deep convolutional neural network for cancer diagnosis
2019, NeurocomputingCitation Excerpt :Multi-task learning is more suitable for our study because we try to achieve satisfactory classification performance on each cancer dataset. With the big success of deep learning technique in image processing [40–44] and pattern recognition [45–48], more and more researchers incorporate multi-task learning and deep learning techniques together in computer vision [23–25,31,49,50] and bioinformatics [30,38,39,51] since these three years. In the computer vision field, Zhang et al. [30] proposed a tasks-constrained deep convolution network (TCDCN) model to jointly optimal facial landmark detection with multiple related tasks, e.g., head pose estimation task and facial attribute inference task.
Toward a robust swallowing detection for an implantable active artificial larynx: a survey
2023, Medical and Biological Engineering and ComputingAutonomous Swallow Segment Extraction Using Deep Learning in Neck-Sensor Vibratory Signals From Patients With Dysphagia
2023, IEEE Journal of Biomedical and Health Informatics
Joshua M. Dudik received his Bachelor’s degree in biomedical engineering from Case Western Reserve University, OH in 2011 whereas he earned his Master’s in bioengineering from the University of Pittsburgh, PA in 2013. In 2015, he completed a Ph.D. degree in electrical engineering at the University of Pittsburgh, PA. His current work is focused on cervical auscultation and the use of signal processing techniques to assess the swallowing performance of impaired individuals.
James L. Coyle received his Ph.D. in Rehabilitation Science from the University of Pittsburgh in 2008 with a focus in neuroscience. He is currently a Professor of Communication Sciences and Disorders in the School of Health and Rehabilitation Sciences (SHRS), and professor of Otolaryngology in the School of Medicine, University of Pittsburgh. He is Board Certified by the American Board of Swallowing and Swallowing Disorders and maintains an active clinical practice in the Department of Otolaryngology, Head and Neck Surgery and the Speech Language Pathology Service of the University of Pittsburgh Medical Center. He is a Fellow of the American Speech Language and Hearing Association.
Amro El-Jaroudi received the B.S. degree in 1984, the M.S. degree in 1984, and the PhD. degree in 1988, all in electrical engineering from Northeastern University, Boston, MA, USA. Currently, he is an Associate Professor at the Department of Electrical and Computer Engineering at the University of Pittsburgh. His research interests include speech processing, spectral estimation, and digital signal processing algorithms.
Zhi-Hong Mao received the dual B.S. degrees in automatic control and mathematics and the M.Eng. degree in intelligent control and pattern recognition from Tsinghua University, Beijing, China, in 1995 and 1998, respectively, the S.M. degree in aeronautics and astronautics from Massachusetts Institute of Technology, Cambridge, MA, USA, in 2000, and the Ph.D. degree in electrical and medical engineering from the Harvard-MIT Division of Health Sciences and Technology, Cambridge, in 2005. He joined the University of Pittsburgh, Pittsburgh, PA, USA, as an Assistant Professor in 2005 and became an Associate Professor of electrical engineering and bioengineering in 2011 and William Kepler Whiteford Faculty Fellow in 2012. His research interests include human-in-the-loop control systems, networked control, and neural control and learning. Dr. Mao received the Faculty Early Career Development (CAREER) Award of National Science Foundation and the Andrew P. Sage Best Transactions Paper Award of the IEEE Systems, Man and Cybernetics Society in 2010.
Mingui Sun received the B.S. degree in instrumental and industrial automation from Shenyang Chemical Engineering Institute, Shenyang, China, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from the University of Pittsburgh, Pittsburgh, PA, USA, in 1986 and 1989, respectively. In 1991, he joined the University of Pittsburgh, where he is currently a Professor of neurosurgery, electrical and computer engineering, and bioengineering. He is also the Director of Research at Computational Diagnostics, Inc., Pittsburgh. His current research interests include advanced biomedical electronic devices, biomedical signal and image processing, sensors and transducers, biomedical instruments, artificial neural networks, wavelet transforms, time-frequency analysis, and the inverse problem of neurophysiological signals. He has authored or coauthored more than 350 publications. Dr. Sun is an elected Fellow of the American Institute for Medical and Biological Engineering. He received the Novel Smart Engineering System Design Award at the International Conference on Artificial Neural Networks in Engineering in 1999, the Best Paper Award at the International Symposium on Uncertainty Modeling and Analysis in 2003, the Chancellor’s Innovation Award of the University of Pittsburgh in 2007, 2008, 2010, and 2011, respectively, and the Distinguished Lectureship of the IEEE Circuits and Systems Society in 2012.
Ervin Sejdić received the B.E.Sc. and Ph.D. degrees in electrical engineering from the University of Western Ontario, London, ON, Canada, in 2002 and 2008, respectively. He was a Postdoctoral Fellow at Holland Bloorview Kids Rehabilitation Hospital/University of Toronto and a Research Fellow in Medicine at Beth Israel Deaconess Medical Center/Harvard Medical School. He is currently an Associate Professor at the Department of Electrical and Computer Engineering (Swanson School of Engineering), the Department of Bioengineering (Swanson School of Engineering), the Department of Biomedical Informatics (School of Medicine) and the Intelligent Systems Program (School of Computing and Information) at the University of Pittsburgh, PA. His research interests include biomedical and theoretical signal processing, swallowing difficulties, gait and balance, assistive technologies, rehabilitation engineering, anticipatory medical devices, and advanced information systems in medicine. Dr. Sejdić received prestigious research scholarships from the Natural Sciences and Engineering Research Council of Canada in 2003 and 2005. He also received the Melvin First Young Investigator’s Award from the Institute for Aging Research at Hebrew Senior Life in Boston, MA. In 2016, President Obama named Prof. Sejdić as a recipient of the Presidential Early Career Award for Scientists and Engineers, the highest honor bestowed by the United States Government on science and engineering professionals in the early stages of their independent research careers. In 2017, Prof. Sejdić was awarded the National Science Foundation CAREER Award, which is the National Science Foundation's most prestigious award in support of career-development activities of those scholars who most effectively integrate research and education within the context of the mission of their organization.