Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This makes me wonder, what's the equivalent to ollama for whisper/SOTA OS tts models? I'm really happy with ollama for locally running OS LLMs, but I don't know of any project that makes it that simple to set up whisper locally.


For SRT, here are some front-ends: https://www.reddit.com/r/OpenAI/comments/163hzhe/recommended...

Also I saw this thing called WhisperScript that looks pretty slick: https://github.com/openai/whisper/discussions/1028

That being said, WhisperX isn't that hard to setup. My step by step from a couple months ago: https://llm-tracker.info/books/logbook/page/transcription-te...


I've been using MacWhisper as a macOS app for running Whisper transcription jobs for a few months, I really like it.

https://goodsnooze.gumroad.com/l/macwhisper


McWhisper sounds like a diet burger. Does it come with fries?


Whisper is an STT model, you can use whisperx to transcribe audios locally via the CLI or whisper-turbo.com that runs in the browser.

For TTS coqui has the best UX and models for a lot of languages although quality is not on par with commercial TTS providers.


I've just been looking for SOTA TTS. I found coqui.ai and elevenlabs.io (and a bunch of others). They're good (and better than older TTS), but I am not fooled by any of them. Do you have recommendations?


Gemelo was the other one listed. I doubt you'll get anything sounding more natural than ElevenLabs with the following settings:

* Model: Multilingual v2

* All options and sliders to boost similarity: set to max/yes

* Stability slider: experimentally set to a value where the model sounds natural enough without destabilising sound output




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: