A new artificial intelligence system has been developed by scientists from Oxford to lip-read. The scientists claim that the AI is better at lip-reading than professional lip-readers. The system, in collaboration with Google’s DeepMind AI division, trained the system on thousands of hours of news programs.
The system has been named, “Watch, Attend and Spell”. When tested, if gave a 50% accurate result. It might not sound that impressive but when the same clips were provided to professional lip-readers, their accuracy rate was only 12%. This means the scientists are finally able to make lip-reading computers better than humans too.
The task is really complicated; words that are pronounced by same mouth movement like “mat, pat, bat” can be really hard to distinguish. Moreover, speaking style varies from person to person, lip-reading different people can be difficult.
This system, however, works on a really smart concept; it does not only learn the mouth shape to predict the word spoken, it also predicts what might be said after the spoken word. For example, “European” is usually followed by “Union” and “Prime” can be followed by “Minister”. However since the AI was allowed to learn based on News clips, its prediction is limited to the phrases and sentences usually spoken in news.
There can be a clear variation at different places and the same AI model might not have the same efficiency in other instances.
A developed lip-reading technology will have numerous benefits. One of the most obvious ones; it will increase the efficiency of speech to script conversions. It will be easier to put accurate subtitles with videos.
Moreover, the hearing impaired can take advantage of this technology. Once it is fully developed plenty of gadgets and equipment might come forward to make their lives a bit easier.