Microsoft claims to have created a technology that, for the first time ever, can understand a human conversation as well as a person does.
In a paper published on the 18 October, researchers from Microsoft’s Artificial Intelligence and Research team reported that using AI they had created a speech recognition system that made the same or fewer errors than a professional transcriptionist.
According to the team the technology is able to perform at a word error rate (WER) of 5.9 per cent, a major reduction on the 6.3 per cent that the team was able to achieve just last month.
Xuedong Huang, the company’s chief speech scientist confirmed the results saying “We’ve reached human parity, this is an historic achievement.”
The results have well exceeded not only the goal they set themselves for the end of the year, but the expectations of many in the industry.
SUBSCRIBE AND FOLLOW TECH
Get top stories and blog posts emailed to me each day. Newsletters may offer personalized content or advertisements. Learn more
Harry Shum, the executive vice president who heads the Microsoft Artificial Intelligence and Research group said: “Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,”
Speech recognition software is an integral part of our daily lives, from using services like phone banking, to creating translation tools that can understand and translate a person in real time.
What’s really interesting about this achievement is that it’s not even close to perfection, and that’s absolutely fine.
You see humans also have a pretty significant error rate, what we’re capable of doing however is filling in the gaps really well.
To create a piece of software that could compensate for this error rate the team used something called a ‘Deep neural network’. This involved taking vast quantities of data - called training sets - and then using that to teach the computer to recognise patterns that are either right or wrong.
In this case it would be hearing sounds and knowing which word it would be.
“This accomplishment is the culmination of over twenty years of effort,” said Geoffrey Zweig, who manages the Speech & Dialog research group.