Google AI researchers working with the ALS Therapy Development Institute today shared details about Project Euphonia, a speech-to-text transcription service for people with speaking impairments. They also say their approach can improve automatic speech recognition for people with non-native English accents as well.
People with amyotrophic lateral sclerosis (ALS) often have slurred speech, but existing AI systems are typically trained on voice data without any affliction or accent.
The new approach is successful primarily due to the introduction of small amounts of data that represents people with accents and ALS.
“We show that 71% of the improvement comes from only 5 minutes of training data,” according to a paper published on arXiv July 31 titled “Personalizing ASR for Dysarthric and Accented Speech with Limited Data.”
Personalized models were able to achieve 62% and 35% relative word error rate (WER) improvement for ALS and accents respectively.
The ALS speech data set consists of 36 hours of audio from 67 people with ALS, working with the ALS Therapy Development Institute.
The non-native English speaker data set is called L2 Arctic and has 20 recordings of utterances that last one hour each.
Project Euphonia also utilizes techniques from Parrotron, an AI tool for people with speech impediments introduced in July, as well as fine-tuning techniques.
Written by 12 coauthors, the work is being presented at International Speech Communication Association, or Interspeech 2019, which takes place September 15-19 in Graz, Austria.
“This paper’s approach overcomes data scarcity by beginning with a base model trained on thousands of hours of standard speech. It gets around sub-group heterogeneity by training personalized models,” the paper reads.
The research, which a Google AI blog post highlighted today, follows the introduction of Project Euphonia and other initiatives in May, such as Live Relay, a feature to make phone calls easier for deaf people, and Project Diva, an effort to make Google Assistant accessible for nonverbal people.
Google is soliciting data from people with ALS to improve its model’s accuracy and is working on next steps for Project Euphonia, such as using phoneme mistakes to reduce word error rates.