Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I worked on building exactly this earlier this year. I was hanging out in Taiwan for a few months and thought, surely the Babel Fish should exist by now.

I did several experiments recording from all the microphones I could on my iPhone and AirPods while out in the wild. My conclusion: it's impossible right now for that hardware given the microphones we have and what they pick up.

So much of what's spoken is at a combination of (a) high distance (b) low volume (c) background obscuration. Something that was clear as day to my ears would barely register on the mics. While context is of course an issue, the raw audio didn't have enough to even translate.

The one caveat is that there might be low-level (i.e., Apple-only) access to headphone microphones that capture the environment to do noise cancellation. I'm not sure though---I couldn't find them on any API.

For cases where you do have clear audio, existing apps (e.g., Google Translate) are so close to achieving this, but don't let you specify audio outputs with enough fine grained control. By default, it will start screaming out of your phone what you were attempting to silently translate.



Also a lot of spoken language involves context that AI is nowhere near understanding yet, let alone all the cultural baggage necessary to accurately translate/localize a lot of utterances.

"Can you stand up?" would be translated differently into Japanese depending on whether you're implying you need them to move their butt off your cell phone versus directly inquiring as to the function of their legs after a car accident. If you speak English and hear it as a background without the rest of the context being picked up, your brain instinctively knows it can interpret it either way, no problem.

But if you're Japanese and the AI picks a specific way to translate it, then you are completely unaware of the ambiguity because the AI resolved it with a 50% chance of being wrong.


>"Can you stand up?" would be translated differently into Japanese depending on whether you're implying

nitpicky, but is it though? not really. and it's as much 'difference depending on what you're implying' as there would be in english comparing just saying 'can you stand up' or specifying 'from the seat/at all'.


Probably not the strongest example but there are definitely phrases that are specific in one language but ambiguous in another.


There are certainly nuances, even when 'understood'

Google: "A bit sticky, things are pretty sticky down there."


I'm on mobile so can't find the link but years ago there was a DARPA (iirc) program trying to solve this problem in the context of surveillance in a loud crowded room. Their conclusion was that there needed to be n+1 microphones in the room to be able to cleanly differentiate all of the noise, where n is the number of noise sources, which in their case was number of conversations going on in the room (assuming no other loud sources of noise like music).

I think it's totally doable but you'd need many more microphones in order to deal with real world noise. As MEMS microphone quality improves, this should eventually be possible with a combination of smartphone/headphone/some other device like something around your neck.


Apart from the dynamic range challenges for sensing, source separation is hard. There's been a pretty long line of research into the area - see "cocktail-party problem". AFAIK it's still a mostly unsolved problem.


There's also some magic to the Universal Translator and Babel Fish: they perform zero-shot real time translation.

That is, they are able to translate (in all directions) novel languages that were not previously heard[0]. It is an open question, with likely a negative answers, that there is a universal grammar even among humans[1] (the definition itself is vague but even the most abstract version is suspect and highly likely to not be universal across species). I think no one will be surprised if it is always impossible to interpret an entire language based on only a few words (let alone do it in real time)

This isn't a knock down, because even a trained device is insanely useful, it's just a note about limitations and triage. This is awesome stuff and I can't wait for the day we have transnational headphones. It's an incredibly complex problem that I'm sure is not short of surprises.

[0] There are a few exceptions such as Star Trek TNG's episode Darmok, S5E2, where the Tamarians' language is unable to be translated due to its reliance on cultural references (the literal words are translated but the semantic meanings are not). It's a well known episode and if you hear anyone saying "Shaka, when the walls fell" (translates to "Failure") they are referencing this episode (often not using the language accurately but who cares (nerds. The answer is nerds)).

[1] https://en.wikipedia.org/wiki/Universal_grammar


Can’t speak for ST, but did they ever say the babel fish understood languages it never heard before? I thought the galaxy was just exceptionally well-cataloged, given the HHG itself, and humans were hardly unknown.


The babel fish translated via brainwave energy and a telepathic matrix:

> The Babel fish is small, yellow and leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from its own carrier but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with the nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish.


“Now it is such a bizarrely improbable coincidence that anything so mind-bogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as a final and clinching proof of the nonexistence of God.

“The argument goes something like this: ‘I refuse to prove that I exist,’ says God, ‘for proof denies faith, and without faith I am nothing.’

“‘But,’ says Man, ‘the Babel fish is a dead giveaway, isn’t it? It could not have evolved by chance. It proves you exist, and so therefore, by your own arguments, you don’t. QED.’

“‘Oh dear,’ says God, ‘I hadn’t thought of that,’ and promptly vanishes in a puff of logic.

“‘Oh, that was easy,’ says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing.

“Most leading theologians claim that this argument is a load of dingo’s kidneys, but that didn’t stop Oolon Colluphid making a small fortune when he used it as the central theme of his best-selling book, Well That about Wraps It Up for God.

“Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.”


I couldn't help but hear this in my mind as it was read in the voice of the narrator from the old BBC "Hitchhiker's Guide" mini-series.


I think idea of Babel Fish might encroach on the computational complexity limit in some sense. Imagine a future "Theory of Everything" book written in alien language. The book has total of 1 million characters across its pages where each character is distinct. Now Babel Fish must be able to "translate" such a language to English given its oracle like powers? Can it do the job?


While Arthur Dent does read some stuff throughout the series that couldn't possibly be in English like signs on an alien planet, the full nature of the babel fish is rather vague and we don't know if it would work that way. As far as I can tell, all the written text it translates for Dent is in the context of living civilizations so the babel fish has brainwave energy to feed into the telepathic matrix - presumably telepathically using the knowledge of nearby persons for the translation.

That said, given the Heart of Gold improbability drive, I don't think information theoretic violations are your biggest problems.


Well, then. Magic indeed!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: