Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been experimenting with something similar to this approach recently. IndexTTS2 gives you emotion vectors as an input, I used an external emotion classification model on the LLM output to modulate the TTS emotion vectors. You need to manage the state of the current affect with a bit of care or it sounds unhinged, but it's worked surprisingly well so far. I wired it together using Cats Effect.

As you'd expect latency isn't great, but I think it can be improved.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: