Simon has released 0.4, a new version of an open source speech recognition system. Simon features a whole new recognition layer, context-awareness for improved accuracy and performance, and a dialog system able to hold whole conversations with the user and more.
The company says that a lot of work has gone into making Simon easier to use, both for existing and new users. Most visibly, the main window of Simon has been reorganized to bring the most important options together in one screen.
The newly introduced Simon base model format (.sbm) and the integration of a GHNS online repository of base models have removed the last big hurdle of the initial configuration. Users can now easily go from a fresh installation to a working setup in less than five minutes without any preparation.
One of the major internal changes of Simon 0.4 is included support for the BSD licensed CMU SPHINX. While the company also maintains full support for HTK and Julius, new models compiled with Simon will default to the SPHINX backend and the (proprietary) HTK is no longer required to build user-generated models. Simon will select the correct backend for a configuration transparently and automatically.
A major problem of open source speech recognition has always been the lack of freely available high quality speech models, according to the company. The Voxforge project has been working for years towards GPL acoustic models for a variety of languages. While their models are not yet perfect, they offer a promising starting point.
The English Voxforge model is available as a Simon base model and can be downloaded and imported with Simon. Additionally, starting with Simon 0.4, users will also have the option to contribute their gathered Simon training samples directly to the Voxforge server.
Additionally, starting with Simon 0.4, users will also have the option to contribute their gathered Simon training samples directly to the Voxforge server. These recordings will then be used to train and improve the general acoustic models.
There is a simple rule of thumb in speech recognition: the smaller the application domain, the better the recognition accuracy, the company says. This was always one of the core principles of Simon. In Simon 0.4, however, it went one step further: Simon can now re-configure itself on-the-fly as the current situation changes. Through so called “context conditions” Simon 0.4 can automatically activate and deactivate selected scenarios, microphones and even parts of a training corpus.