Text to speech as a technology has come a long way from the robotic sounding voices of yesteryears. In fact they have become so good now that many consider them to be a solved problem.
In any case, we at Ozonetel thought it would be good to have our own Indian voice and we started working on it. We made sure that we would work once on it and build a pipeline so that it would work for any Indian language.
We ran the pipeline for the first time yesterday for Hindi text and we are happy to say that the result was a success. There is still a long way to go before we release the voices, but for now check out below.
Experiment:
We took the hindi text of Modi’s Mann Ki Baat and sent it through 4 engines, Google wavenet, Google standard, Microsoft polly and Ozonetel TTS. The results are below. Please listen and comment which one you think sounds better. I will not mention which sound is which engine to keep the testing blind. Please comment on which voice you think is best and I will reveal the results and voice tomorrow on this page :)
TTS 1: https://soundcloud.com/nutanc/tts1-1
TTS 2: https://soundcloud.com/nutanc/tts2-1
TTS 3: https://soundcloud.com/nutanc/tts3-1
TTS 4: https://soundcloud.com/nutanc/tts4-1
So go ahead, please vote in the comments or share any of your thoughts. Soon, will share our thought process and approach in building this.
Update(4th June 2019):
Scroll down for the reveal of TTS names……
TTS1 is Google standard.
TTS2 is Google wavenet.
TTS3 is Microsoft.
TTS4 is Ozonetel.
I think for a majority of blind testers TTS3 was a better sounding voice. So congrats Microsoft :)