Gerald Friedland, the director of the Audio and Multimedia lab at the International Computer Science Institute at UC Berkeley, told WIRED Magazine in 2016 that “depending who you ask, speech recognition is either solved or impossible. This wider variety of possible inputs and outputs makes it a harder task for AI.īecause of these challenges, AI experts lack consensus on how quickly, and even whether, computers will completely replace human transcribers. A voice assistant like Alexa only needs to determine which, if any, of a predetermined list of vocal command is being uttered, whereas a transcription program needs to listen for, and capture, any utterance at all. Most people who use Siri or Alexa would agree that, while those tools do an admirable job of understanding a user most of the time, most of us wouldn’t trust them with our lives.Įven when an AI system learns to have high accuracy in the best case - for instance, a single clear-voiced speaker in a quiet room - maintaining that accuracy for multiple voices, multiple languages, heavy accents, background noises, crowded rooms, and more becomes very complicated.Īnd while voice recognition for vocal commands and voice recognition for transcription seem like similar problems, the latter is actually much more challenging. Accuracy is especially important when transcriptions are affecting the accuracy of quotes in news stories, the outcomes of expensive and important trials, or even the lives of patients While it might seem like a straightforward project for AIs, since it can be seen as simply converting one kind of data (sound) into another (text), in fact, various factors make voice transcription a significant computing problem. Although it’s hard to find statistics for transcription specifically, Grand View Research projects that the global voice recognition market overall will hit $127.58 billion by 2024. According to the Bureau of Labor Statistics, there were 57,400 medical scribes and 19,600 court reporters (which includes closed captioners for television and other media) in the United States in 2016. In addition, voice to text transcription has long been an important business on its own merits in the medical, legal, and media fields, to name a few, and has traditionally been done by teams of human transcribers who charge rates of $3 or $4 per minute. This makes reliable voice transcription an important goal for artificial intelligence. Instead, it’s in the form of spoken words on video and audio recordings or even live events. But much of the data in the world isn’t in text form. Artificial intelligence - especially machine learning - is at it’s best when it is working with a large, analyzable data set, like text.
0 Comments
Leave a Reply. |