Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Whisper is an encoder decoder transformer. The input is audio spectrograms, the output is text tokens. It is an improvement over old school transcription methods because it’s trained on audio transcripts, so it makes contextually plausible predictions.

Idk what the definition of an LLM is but it’s indisputable that the technology behind whisper is a close cousin to text decoders like gpt. Imo the more important question is how these things are used in the UX. Decoders don’t have to be annoying, that is a product choice.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: