Background
Introduction
Aim
Methods
Data sources (Search strategy)
OVID Embase
| ||
1 | automatic speech recognition/ | 469 |
2 | ((voice or speech) adj (recogni* or respon*)).tw. | 2516 |
3 | or/1-2 | 27490 |
4 | exp research/ | 380483 |
5 | (qualitative* or quantitative* or mixed method* or descriptive* or research*).tw. | 1194784 |
6 | or/4-5 | 14148120 |
7 | 3 and 6 | 483 |
8 | limit 7 to yr = “2000 -Current” | 433 |
OVID Medline
| ||
1 | Speech Recognition Software | 416 |
2 | ((voice or speech) adj (recogni* or respon*)).tw. | 2081 |
3 | or/1-2 | 2263 |
4 | exp Research/ | 224487 |
5 | (qualitative* or quantitative* or mixed method* or descriptive* or research*).tw. | 840821 |
6 | or/4-5 | 971456 |
7 | 3 and 6 | 360 |
8 | limit 7 to yr = “2000 -Current” | 319 |
OVID PreMedline
| ||
1 | ((voice or speech) adj (recogni* or respon*)).tw. | 140 |
2 | (qualitative* or quantitative* or mixed method* or descriptive* or research*).tw. | 94513 |
3 | 1 and 2 | 20 |
4 | limit 3 to yr = “2000 -Current” | 19 |
Selection of studies
Al-Aynati 2003[18] | Alapetite, 2008[30] | Alapetite, 2009[31] | Callaway, 2002[20] | Derman, 2010[32] | Devine, 2000[33] | Irwin, 2007[34] | Kanal, 2001[35] | Koivikko, 2008[36] | Langer, 2002[37] | Mohr, 2003[22] | NSLHD 2012[29] | Singh, 2011[23] | Zick, 2001[38] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Screening Questions
| ||||||||||||||
Clear research questions | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Appropriate data collected | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
1. Qualitative
| ||||||||||||||
Appropriate qualitative data sources | ||||||||||||||
Appropriate qualitative method | ||||||||||||||
Description of the context | ||||||||||||||
Discussion of researchers’ reflexivity | ||||||||||||||
2. Randomized controlled
| ||||||||||||||
Appropriate randomization | Yes | No | ||||||||||||
Allocation concealment and/or blinding | Yes | No | ||||||||||||
Complete outcome data | Yes | Yes | ||||||||||||
Low withdrawal/drop out | Yes | Yes | ||||||||||||
Screening Questions
| ||||||||||||||
3. Non-randomized
| ||||||||||||||
Recruitment minimized bias | No | |||||||||||||
Appropriate outcome measures | Yes | |||||||||||||
Intervention & control group comparable | Yes | |||||||||||||
Complete outcome data/acceptable response rate | Yes | |||||||||||||
4. Quantitative descriptive
| ||||||||||||||
Appropriate sampling1
| No | Yes | Yes | No | Yes | Yes | No | Yes | Yes | Yes | No | |||
Appropriate sample2
| No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | |||
Appropriate measurement (valid/standard) | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | |||
Acceptable response rate | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | No | Yes | Yes | |||
Total Score
3
(Yes =1, No = 0)
|
5
|
4
|
6
|
6
|
5
|
6
|
5
|
5
|
6
|
5
|
6
|
4
|
6
|
5
|
Description and methodological quality of included studies
Outcomes of the studies
Author | Aim | Setting | Outcome measures | Results |
---|---|---|---|---|
Year | Sample | |||
Country Design | Speech technology (ST) | |||
Design | ||||
Al-Aynati and Chorneyko 2003 [18] | To compare SR software with HT for generating pathology reports |
Setting: Surgical pathology | 1. Accuracy rate |
Accuracy rate (mean %)
|
Sample: 206 pathology reports | 2. Recognition/ Transcription errors | SR: 93.6 HT: 99.6 | ||
Canada Experimental |
ST: IBM Via Voice Pro version 8 with pathology vocabulary dictionary |
Mean recognition errors
| ||
SR: 6.7 HT: 0.4 | ||||
Mohr et al. 2003 [22] | To compare SR software with HT for clinical notes |
Setting: Endocrinology and Psychiatry | 1. Dictation/recording time + transcription (minutes) = Report Turnaround Time (RTT). |
RTT (mins)
|
Endocrinology
| ||||
SR: (Recording + transcription) = 23.7 | ||||
HT: (Dictation + transcription) = 25.4 | ||||
USA Experimental |
Sample: 2,354 reports | |||
ST: Linguistic Technology Systems LTI with clinical notes application | SR: 87.3% (CI 83.3, 92.3) productive compared to HT. | |||
Psychiatry transcriptionist
| ||||
SR: (Recording + transcription) = 65.2 | ||||
HT: (Dictation + transcription) = 38.1 | ||||
SR: 63.3% (CI 54.0, 74.0) productive compared to HT. | ||||
Psychiatry secretaries
| ||||
SR: (Recording + transcription) = 36.5 | ||||
HT: (Dictation + transcription) = 30.5 | ||||
SR: 55.8% (CI 44.6, 68.0) productive compared to HT. | ||||
Author, secretary, type of notes were predictors of productivity (p < 0.05). | ||||
NSLHD 2012 [29] | To compare accuracy and time between SR software and HT to produce emergency department reports |
Setting: Emergency Department | 1. RTT |
RTT mean (range) in minutes
|
Australian Experimental |
Sample: 12 reports | SR: 1.07 (46 sec, 1.32) | ||
ST: Nuance Dragon Voice Recognition | HT: 3.32 (2.45, 4.35) | |||
HT: Spelling and punctuation errors | ||||
SR: Occasional misplaced words | ||||
Alapetite, 2008 [30] | To evaluate the impact of background |
Setting: Simulation laboratory | 1. Word Recognition Rate (WRR) |
WRR
|
Denmark Non-experimental | noise (sounds of alarms, aspiration, metal, people talking, scratch, silence, ventilators) and other factors affecting SR accuracy when used in operating rooms |
Sample: 3600 short anaesthesia commands |
Microphone
| |
Microphone 1: Headset 83.2% | ||||
ST: Philips Speech Magic 5.1.529 SP3 and Speech Magic Inter Active Danish language, Danish medical dictation adapted by Max Manus | Microphone 2: Handset 73.9% | |||
Recognition mode
| ||||
Command 81.6% | ||||
Free text 77.1% | ||||
Background noise
| ||||
Scratch 66.4% | ||||
Silence 86.8% | ||||
Gender
| ||||
Male 76.8% | ||||
Female 80.3% | ||||
Alapetite et al. 2009 [31] | To identify physician’s perceptions, attitudes and expectations of SR technology. |
Setting: Hospital (various clinical settings) | 1. Users’ expectation and experience |
Overall
|
Denmark Non-experimental |
Sample: 186 physicians | Predominant response noted. |
Q1 Expectation: positive 44% | |
Q1 Experience: negative 46% | ||||
Performance
| ||||
Q8 Expectation: negative 64% | ||||
Q8 Experience: negative 77% | ||||
Time
| ||||
Q14 Expectation: negative 85% | ||||
Q14 Experience: negative 95% | ||||
Social influence
| ||||
Q6 Expectation negative 54% | ||||
Q6 Experienced negative 59% | ||||
Callaway et al. 2002 [20] | To compare an off the shelf SR software with manual transcription services for radiology reports |
Setting: 3 military medical facilities | 1. RTT (referred to as TAT) |
RTT
|
USA Non-experimental |
Sample: Facility 1: 2042 reports | 2. Costs |
Facility 1: Decreased from 15.7 hours (HT) to 4.7 hours (SR) | |
Facility 2: 26600 reports |
Completed in <8 h: SR 25% HT 6.8% | |||
Facility 3: 5109 reports |
Facility 2: Decreased from 89 hours (HT) to 19 hours (SR) | |||
ST: Dragon Medical |
Cost
| |||
Professional 4.0 |
Facility 2: $42,000 saved | |||
Facility 3: $10,650 saved | ||||
Derman et al. 2010 [32] | To compare SR with existing methods of data entry for the creation of electronic progress notes |
Setting: Mental health hospital | 1. Perceived usability |
Usability
|
Canada Non-experimental |
Sample: 12 mental health physicians
ST: Details not provided | 2. Perceived time savings | 50% prefer SR | |
3. Perceived impact |
Time savings: No sig diff (p = 0.19) | |||
Impact
| ||||
Quality of care No sig diff (p = 0.086) | ||||
Documentation No sig diff (p = 0.375) | ||||
Workflow No sig improvement (p = 0.59) | ||||
Devine et al. 2000 [33] | To compare ‘out-of-box’ performance of 3 continuous SR software packages for the generation of medical reports. |
Sample: 12 physicians from Veterans Affairs facilities New England | 1. Recognition errors (mean error rate) |
Recognition errors (mean-%)
|
USA Non-experimental |
ST: System 1 (S1) IBM ViaVoice98 General Medicine Vocabulary. | 2. Dictation time |
Vocabulary
| |
3. Completion time |
S1 (7.0 -9.1%) S3 (13.4-15.1%) S2 (14.1-15.2%) | |||
System 2 (S2) Dragon Naturally Speaking Medical Suite, V 3.0. | 4. Ranking |
S1 Best with general English and medical abbreviations. | ||
Dictation time: No sig diff (P < 0.336). | ||||
System 3 (S3) L&H Voice Xpress for Medicine, General Medicine Edition, V 1.2. | 5. Preference |
Completion time (mean):
| ||
S2 (12.2 min) S1 (14.7 min) S3 (16.1 min) | ||||
Ranking: 1 S1 2 S2 3 S3
| ||||
Irwin et al. 2007 [34] | To compare SR features and functionality of 4 dental software application systems. |
Setting: Simulated dental | 1. Training time |
Training time
|
USA Non-experimental |
Sample: 4 participants (3 students, 1 faculty member) | 2. Charting time |
S1 11 min 8 sec S2 9 min 1 sec (no data reported for S3 ad S4). | |
3. Completion | ||||
ST: Systems 1 (S1) Microsoft SR with Dragon NaturallySpeaking. | 4. Ranking |
Charting time: S1 5 min 20 sec S2 9 min 13 sec, (no data reported for S3 ad S4). | ||
System 2 (S2) Microsoft SR |
Completion %: S1 100 S2 93 S3 90 S4 82 | |||
Systems 3 (S3) & System 4 (S4) Default speech engine. |
Ranking
| |||
1 S1 104/189 2 S2 77/189 | ||||
Kanal et al. 2001 [35] | To determine the accuracy of continuous SR for transcribing radiology reports |
Setting: Radiology department | 1. Error rates |
Error rates (mean ± %)
|
USA Non-experimental |
Sample: 72 radiology reports 6 participants |
Overall (10.3 ± 33%) | ||
Significant errors (7.8 ± 3.4%) | ||||
ST: IBM MedSpeaker/Radiology software version 1.1 |
Subtle significant errors (1.2 ± 1.6%) | |||
Koivikko et al. 2008 [36] | To evaluate the effect of speech recognition onadiology workflow systems over a period of 2 years |
Setting: Radiology department | 1. RTT (referred to as TAT) at 3 collection points: |
RTT (mean ± SD) in minutes
|
Finland Non-experimental |
Sample: > 20000 reports; 14 Radiologists | HT: 2005 (n = 6037) | HT: 1486 ± 4591 | |
ST: Finnish Radiology Speech | SR1: 2006 (n = 6486) | SR
1:
323 ± 1662 | ||
Recognition System (Philips Electronics) | SR2: 2007 (n = 9072) | SR
2
: 280 ± 763 | ||
HT: cassette-based reporting | 2. Reports completed ≤ 1 hour |
Reports ≤ 1 hour (%)
| ||
SR1: SR in 2006 | HT: 26 | |||
SR2: SR in 2007 | SR
1
: 58 | |||
Training:
| ||||
10-15 minutes training in SR | ||||
Langer 2002 [37] | To compare impact of SR on radiologist productivity. Comparison of 4 workflow systems |
Setting: Radiology departments | 1. RTT (referred to as TAT) |
RTT (mean ± SD%) in hours/ RP
|
USA Non-experimental |
Sample: Over 40 radiology sites | 2. Report productivity (RP), number of reports per day |
System 1
| |
System 1 Film, report dictated, HT | RTT: 48.2 ± 50 RP: 240 | |||
System 2 Film, report dictated, SR |
System 2
| |||
System 3 Picture archiving and communication system + HT | RTT: 15.5 ± 93 RP: 311 | |||
System 3
| ||||
System 4 Picture archiving and communication system + SR | RTT: 13.3 ± 119 (t value at 10%) RP: 248 | |||
System 4
| ||||
RTT: 15.7 ± 98 (t value at 10%) RP: 310 | ||||
Singh et al. 2011 [23] | To compare accuracy and turnaround |
Setting: Surgical pathology | 1. RTT (referred to as TAT) |
RTT in days
|
USA Non-experimental | times between SR software and traditional transcription service (TS) when used for generating surgical pathology reports |
Sample: 5011 pathology reports | 2. Reports completed ≤ 1 day |
Phase 0: 4 |
ST: VoiceOver (version 4.1) Dragon Naturally Speaking Software (version 10) | 3. Reports completed ≤ 2 day |
Phase 1: 4 | ||
Phase 0: 3 years prior SR |
Phase 2–4: 3 | |||
Phase 1: First 35 months of SR use, gross descriptions |
Reports ≤ 1 day (%)
| |||
Phase 0: 22 | ||||
Phase 2–4: During use of SR for gross descriptions and final diagnosis | Phase 1: 24 | |||
Phase 2–4: 36 | ||||
Reports ≤ 2 day (%)
| ||||
Phase 0: 54 | ||||
Phase 1: 60 | ||||
Phase 2–4: 67 | ||||
Zick et al. 2001 [38] | To compare accuracy and RTT between |
Setting: Emergency Department | 1. RTT (referred to as TAT) |
RTT in mins
|
USA Non-experimental | SR software and traditional transcription service (TS) when used for recording in patients’ charts in ED |
Sample: Two physicians - 47 patients’ charts | 2. Accuracy | SR: 3.55 TS: 39.6 |
3. Errors per chart |
Accuracy % (Mean and range)
| |||
ST: Dragon NaturallySpeaking Medical suite version 4 | 4. Dictation and editing time | SR: 98.5 (98.2-98.9) TS: 99.7 (99.6-99.8) | ||
4. Throughput |
Average errors/chart
| |||
SR: 2.5 (2–3) TS: 1.2 (0.9-1.5) | ||||
Average dictation time in mins (Mean and range)
| ||||
SR: 3.65 (3.35-3.95) TS: 3.77 (3.43-4.10) | ||||
Throughput (words/minute)
| ||||
SR: 54.5 (49.6-59.4) TS: 14.1 (11.1-17.2) |