Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Hindi Conversational Text-to-Speech?
9 points by kamathsutra on April 4, 2024 | hide | past | favorite | 11 comments
We were developing Sapien(https://smallest.ai/sapien), primarily focused on the Indian markets and we realize that there is no text-to-speech for conversational Hindi. US accents from elevenlabs and playHT are pretty good but Indian accents are narrative and not conversational in nature.

We are hence planning to develop this in-house. However if there is someone out there who provides low latency text-to-speech in conversational Hindi, then we would much rather use their solution.

Do you guys know anyone who does? Or do you think we should go after this ourselves?




Microsoft has done a lot of work in Speech and Sarvam is also now going into it. But Sarvam is yet to produce a decent LLM for Hindi and now on top of that, they are venturing into speech, so I am not sure how that will pan out. My guess is Sarvam would be more focused on increasing coverage of text to speech for all Indian languages than to make the model indistinguishable from humans for top 4-5 Indian languages. They have some tie-ups with the government for the same.


Vosk has a Hindi model but for some reason its not baked into the demo

https://alphacephei.com/vosk/models

https://ccoreilly.github.io/vosk-browser/


This is speech to text right?


My friend was a contractor for Hindi TTS at Google https://cloud.google.com/text-to-speech


It's quite robotic.


Actually I noticed that they have some better models now, still narrative in nature, but much better than before.


Someone is trying to train a WhisperSpeech model for Hindi but I haven’t heard any news about it yet, so it might not be ready.

I’d also check out https://github.com/dubverse-ai/MahaTTS


The whisperspeech one is pretty bad/uncontrollable even in English last I checked, but maybe it can be improved. MahaTTS is quite bad for now. It is trained on IITM data, and that data in my opinion is not great for realistic conversational Hindi TTS. But I need to take a second look.


> Indian accents are narrative and not conversational in nature

What are some examples of how these things differ? I've been exploring Hindi recently, but find that I'm learning some pretty stuffy speech from Snell's books.


The way I think about realistic conversational speech is that if you get a phone call, you should not be able to tell whether it is an AI or a human just based on the voice. For English and even some asian languages like Chinese, this has already happened.

If you are a non-Hindi speaker and want to understand the difference, then I might find it difficult to explain :P But whatever you are learning, if you start practicing with a native speaker, I am sure you will easily surpass the SoTA hindi TTS models.

Non-conversational example: https://www.youtube.com/watch?v=ayYk3XkP0ts&t=22s&ab_channel...

You can list to this and understand easily that its AI generated speech. However, it works very well for dubbing etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: