The Google Speech Team have announced that they are improving their neural network acoustic models that will vastly improve Google Voice Search. To achieve this the team is using Connectionist Temporal Classification (CTC) and sequence discriminative training techniques, which are more accurate, especially in noisy environments, and they are blazingly fast.
This new technology developed by the Google Speech Team uses the entire sentence you speak instead of relying on individual word fragments to identify what you are saying. Using Recurrent Neural Networks (RNN) technology, Google is able to hear the sounds your voice makes in context.
Our improved acoustic models rely on Recurrent Neural Networks (RNN). RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before. Try saying it out loud – “museum” – it flows very naturally in one breath, and RNNs can capture that. The type of RNN used here is a Long Short-Term Memory (LSTM) RNN which, through memory cells and a sophisticated gating mechanism, memorizes information better than other RNNs. Adopting such models already improved the quality of our recognizer significantly.
To reduce computations, Google has also trained the models to take in audio in larger chunks while improving recognition in noisy places by adding artificial noise to the training data. The researchers said that to create the additional improvements, the speech team had to tweak the models to find an optimal balance between improved predictions and latency. These improvements give Google a faster and more accurate acoustic model that could be used on real voice traffic.
Google also mentions that they included artificial noise and reverb in their training data, which helps with voice recognition in noisy environments. In addition to being more accurate and quicker to respond, Google’s newer technology requires much lower computational resources. It is pretty technical, but if you want, you can read the entire blog here.
Google’s new acoustic models are already working in voice search for both Android and iOS, so feel free to try it out if you have not said “OK Google” in a while.
Source: Google Research Blog