Hacker Newsnew | past | comments | ask | show | jobs | submit | hadrianpaulo's commentslogin

This is amazing work from Meta and Peng-Jen Chen.

As a person who grew up having Hokkien as my first language, I've feared that Hokkien might go extinct over time. Having a formal English and Mandarin Chinese education had pretty much started to erase my Hokkien knowledge. I'm glad that AI technologies can be used to preserve this language.


Yeah language attrition is very real. I used to be a near native speaker of Hokkien and Cantonese, now I can barely understand a word after more than a decade of not speaking them.

Going extinct would be sad, sure, but I'm not sure that the effort required to not make it so would be worth it. Putting children through thousands of hours of education in another language, for what exactly? If I have kids, I'm not sure I'd be willing to put them through all that work for a nearly dead language. They're not gonna use it to speak with their peers, or to open up communication with another group of people, it's just preservation for the sake of preservation.


We're making the opposite choice, my wife speaks Cantonese to our kid because we believe it's worth preserving (of course, the fact that there are a lot more medias in Cantonese than in Hokkien makes it easier to preserve).

Of course Mandarin, is more "useful" in a purely pragmatic way but languages are social things, they're tied to culture, they help create relationships with other people who speak those languages (I have plenty of Teo Chew friends who have made very good friends with others from different country because they are kakinang) and speaking multiple languages is always worth keeping.

Plus even besides this, having a multilingual home, regardless of the usefulness of the languages that are spoken, is associated with a lot of cognitive benefits for children so the thousands of hours learning a different language are useful.


Perhaps the old reasons will do just as well. Hokkien is an interesting case of being involved in a potentially fractious political situation- one can imagine to further distinguish themselves from the Chinese mainland, the people of Taiwan promote the use of Hokkien. Not dissimilar to say bilingual laws in Canada.

After all, the nineteenth century saw the consolidation of national languages and elimination of regional dialects in the name of fostering nationalism, which continued in the twentieth but also saw the revival of languages like Irish or Hebrew for new nationalisms.


I doubt that desire for cultural differentiation is going to change much about the diglossic situation in Taiwan, since the continued use of Traditional Chinese characters is already a pretty big differentiator. The government is making some effort to promote the Tai-lô writing system (e.g. https://tailo.moe.edu.tw/index.php ), but overall there's very little incentive for Hokkien speakers to become literate in it. If you look at Wikipedia visits from Taiwan, 92% go to the Mandarin version, 6% to English, 1% to Japanese, and Hokkien is in the tail of small languages behind even Cantonese and Classical Chinese. Of these, I think English is most likely to grow in the future.

Also, the differentiation potential is somewhat limited due to the fact that the majority of Hokkien speakers lives in Fujian province on the mainland (Hokkien = Fujian) and there's some preservation work going on there as well. E.g. Xiamen University's Speech Lab had a working demo of spoken-Hokkien-to-written-Mandarin translation in 2018, although the link shortener they used has since suffered from link rot. https://speech.xmu.edu.cn/2018/1215/c18169a359542/page.htm


You’re not wrong, not to mention the problematic nature of conflating Hokkien with “Taiwanese” identity, as that then omits the Hakka, never mind the aboriginal languages. But one could see a Benshengren revival of Taigi nonetheless, even if only as a set of quixotic pan-Green government initiatives.

Also, Hokkien interest is present even outside of that geopolitical flashpoint:

https://www.todayonline.com/singapore/stories-behind-tiktok-...


Preservation of a cosmovision and a form of intelligence, not just a language. Furthermore, I hypothesise whether kids lose intelligence when adults shut their mother tongues off.


I like that we're preserving dying languages by maintaining the ability to understand and translate them as archives of humanity on Earth's past, but I'm hoping in the next century or two language attrition will whittle all the world's languages down to just a handful.

The day every person on Earth speaks some shared common language (ideally one so straightforward that it can be learned by children within a year or so) will be a day I'd celebrate as a monumental milestone in the development of our species.

It's fine if people know other languages too, but having that shared global one is vital.

I'm happy to lose innumerable untranslatable phrases and cultural understandings in service of this.


I disagree. There is no possible benefit to a monoculture of language that would justify the immense loss of culture and of different ways to see the world.

Cultural spheres have always managed to come up with a lingua franca that enabled them to exchange ideas. English fills that role currently and will probably endure to dominate. Even if something goes monumentally wrong with the Anglosphere, it will endure until another language manages to step up to that role.

Children are perfectly able to learn multiple languages within a few years by pure immersion. I can't see what further optimization here would achieve.

Still, judging by the events of the past, languages that are not sanctioned by some state will probably all die out by the end of this century. Further erosion is very unlikely though. The language of any country with at least, say, 50 millions speakers is probably safe.


The language will be “preserved” in that 300 years from now, a tape drive in Harvard’s Widener Library will have a Hokkien model on file for historians writing a tenure book designed to go right back into those dusty shelves.


In the same way all verbal and symbolic language will no longer exist following the invention of the neuraljack, yes.


Most people use their spoken[1] language as a major part of their conscious minds. Do you expect that to change?

[1] (ASL speakers claim to 'think' in sign)


Some researchers think that this is a sort of limitation of our current language structures, and that there exist more general encodings of our ideas ("engrams") that we could express if we had perfect telepathy and perfect comprehension.

In this sense, we would exchange information that was of a higher order than language. For example, a complex idea like "the location where I will meet you for lunch this afternoon" would be independent of the verbal language we use it to express it.

We can see a similar (albeit more limited) exchange of language-independent structural ideas in mathematics. We distinguish between "numbers" (the idea of a quantity) and "numerals" (the symbols we use to describe that quantity). For example, `III`, `three`, `3`, `5 - 2`, and `the number of complete revolutions made by rotating 6*pi radians` are all ways of defining and representing the same common idea — "the number three". Going back to the previous example, imagine if an idea "where I'll meet you for lunch today" could be universally and unambiguously shared in a similar way, and you'll get a rough approximation of what perfect telepathy and perfect comprehension would be like.


I don't think it's just researchers that think this.

Every time that I have a word on the tip of my tongue I have an idea in mind that I wish to express, but I simply cannot remember the word to convey the concept. That is to say that I've already thought of the idea and am now merely looking for a way to express it.

However, I think that "thinking out loud" in your head has benefits. It makes the idea you wish to express more concrete and lets you iterate on it more easily. Think of it like writing down an equation when doing math: it gets easier to reason about it when it's expressed externally. I think ideas and thinking verbally in your head work the same way.


I asked in part because I don't have an internal monologue. I was quite surprised to learn that many other people have one and people who do usually seem surprised that monologueless people can think at all.

So, The idea of thinking without directly using language is less foreign to me than many. Nevertheless, I'll still sometimes use 'words' internally to reason things through which are too complex to solve intuitively, or when enacting a procedure I learned from someone else.

It's not clear to me that the engrams you imagine wouldn't just be a symbolic language under another name. I suppose if they were continuous in some vast highly dimensional space ... but even then one could discretize them at the resolution of distinguishable ideas and it's a symbolic language again. :)


It's still a language, even if it is not a verbal one. Words and voice are just a way to encode it.


Regarding the exact nature of the gestalt transhuman super-intelligences that will arise post-Singularity, who can say?


Its spoken by many people all around me. There's more to preserving a language than translating it to a foreign language.


Also spoken heavily where I used to live. However, a lot of the time it was used by non-Hokien speakers for its rich collection of curse and swear words.


They won't. It will only help accelerate the move towards concentration to national languages (Standard Chinese in Taiwan and China, French in France, etc.) Why would anyone put the effort to learn a "small" language when it can be translated automatically for understanding?

The way to preserve a language is putting humans in the loop. Creating content in that language; interestingly more and more shows are produced in Taiwan in or using non-Mandarin languages as a political way to mark a difference with the big neighbour. And having government support, notably at school (at young age) by allowing partial or total teaching in the language to be preserved.


It's anthropocentric to say a language can only be preserved by live humans rather than AI natural language models and digital corpora. No one use Latin any more but we can still figure out what Roman text meant.

It's also counterproductive to let humans learn a language of limited content resources and use cases.

Taiwan people are highly educated and urbanized. It's much harder to use Taiwanese in Taiwan compared to High German in a Pennsylvanian Amish village.

I don't know how to express clearly in Taiwanese "GPS in my neighborhood has a 100x lower accuracy because of radio interference" or "move this MOSFET up by 15mm to balance the PCB thermal stress". If you still have to switch to Chinese or English from time to time, why not just use the popular languages?

Even Japanese, a language used by 125 million, has similar issues, my Japanese coworkers frequently switch to English during technical discussions.


> Even Japanese, a language used by 125 million, has similar issues, my Japanese coworkers frequently switch to English during technical discussions.

This is really not common and if anything it's something unheard of to me. I work in an English speaking company in Japan and most of my coworkers (who are fluent enough to speak English in technical conversations) would instantly switch back to Japanese to talk about technical things between them if there's no foreigner involved in the conversation. I've seen the same thing happen in my wife's company and other companies too. This is on top of the fact that the level of English education in Japan is very low (unfortunately) and these people who work in English-speaking companies are very much the exception. I don't think I've ever seen a single Japanese person favor using English over Japanese for technical discussions if they were ever given the choice.


^ this. I did a lot of integration work with japan over the years and all documentation is in Japanese. All communication via email and docs is Japanese. Every thing technical is in Japanese.

I struggled with Google translate because it’s like 40% accurate at translating technical related stuff.


Your Japanese coworkers are the exception, not the rule. The vast majority of Japanese speakers in Japan, including those in technical fields, do all of their work communication in Japanese. They may use technical vocabulary borrowed from English and other languages, but those words are used with Japanese pronunciation in Japanese sentences.


In Masahiro Sakurai[0]'s series of YouTube talks[1] on game development, he specifically mentions how he has to tell Japanese developers to always name files and source code functions in English, just in case they need to work with an overseas team.

Yes, it is possible to do everything 100% in Japanese, with the only English being the keywords of whatever programming system you are working with. However, that is more the exception than the rule, especially in larger teams that need to work overseas.

[0] Creator of Super Smash Bros. and Kirby

[1] "Masahiro Sakurai on Creating Games"


Even teams that work with overseas teams tend to funnel that through several people or the like (this is universal). Even "english only" companies in Tokyo will still just have a bunch of docs/convos written in Japanese when the team compositions are not uniformly mixed.

There are of course aesthetic/logistics reasons behind "code the stuff in English" (if only cuz your code is going into an ASCII codepage, and ... yeah, sharing). But the language used in teams is pretty company-culture dependent, and "we do all of our work in English" lands in a very restricted set of companies. Probably Finance is the one where that culture is there, but most tech companies.... there are higher-than-average english language levels in these places, but if there's no other English-preferer (foreigners, but also returnees or people who just like English a lot) in the room? Not happening


I asked them why. They said it's possible to say anything in Japanese, but English has the brevity for some topics.


> Even Japanese, a language used by 125 million, has similar issues, my Japanese coworkers frequently switch to English during technical discussions.

Are you Japanese yourself? If not I don't think it's strange that they would adapt their way of speaking with a foreigner, especially since most technical words in IT are coming from English anyway. For other fields, health for instance, it's totally possible to never heard English in months/years. Japanese is well alive, and English is more a social marker than anything else. Most Japanese have very bad command of English if at all and can live their whole life never using it.


> Even Japanese, a language used by 125 million, has similar issues, my Japanese coworkers frequently switch to English during technical discussions.

You must live in a kind of weird cocoon because in Japan nobody switches to English for technical discussions because they cant even speak English in the first place.


> It's much harder to use Taiwanese in Taiwan compared to High German in a Pennsylvanian Amish village.

Your example for difficult use is particularly apt in that you’re choosing to focus on specialized technical examples, which often default to international lingua franca anyway, often English.

I doubt that day to day use of Taigi in Taiwanese communities is as rare and difficult as you say. Maybe in highly educated and urbanized Taipei, but have you even been to the southern countryside?


You can easily say these two sentences in Spanish that every Spanish speaker has no difficulty understanding.

El GPS en mi barrio tiene precisión cien veces peor debido a la interferencia de radio. mueve este MOSFET 15mm hacia arriba para balancia de estrés térmico a la placa.

I believe you can "invent" a Taiwanese sentence to mean the above, but there is no consensus among Taiwanese speakers in how to say them, so they would need your explanations of what your chosen words mean. If you borrow Chinese words, your sentence will be no much different from the Chinese sentence.

For your question -- yes you can scrape by with only Taiwanese, just like you can live in some areas in the US speaking only Spanish. But to do anything more, like riding a train to another Spanish speaking area, you could meet a conductor who have to open the translator app for you.


For highly technical terms you can use the exact same strategy that Spanish and Mandarin speakers use, just use the English term like you did in your example sentence. A random Taiwanese speaker will not understand MOSFET, but neither will a Spanish speaker, unless they have that technical knowledge.


Yes and even in most of northern Taiwan, there's a ton of Taiwanese everywhere. One of my old friends was born in Neihu and moved to the US around 3rd grade or so. When he visited me in Taipei as an adult, he spoke fluent Taiwanese and much more limited Mandarin. His situation was a bit comical to younger people, but not a real obstacle.

I also lived in the Guishan/Linkou area for a year and heard a lot more Taiwanese than Mandarin in day to day life.


> It's also counterproductive to let humans learn a language of limited content resources and use cases.

Taiwanese/Hokkien have limited content resources because of conscious decisions of previous governing bodies in China, Singapore, Taiwan (not sure about Malaysia). If they had been allowed to flourish, actively promoted and native speakers were taught how to write, it would be a lot different today. The premise of the article is false. Taiwanes/Hokkien has had a standardized written form since the 19th century. The problem is most native speakers do not know it, so really what it suffers from are low literacy rate. I communicate in written Taiwanese via text quite often.

Over the past ten years or so, the government in Taiwan has been trying to promote native language literacy, but with so-so results due to what I see as poor pedagogy.

> I don't know how to express clearly in Taiwanese

Many heritage or home speakers probably feel the same way. There are strong social stigmas against using Taiwanese in academia or people seeing it as a crude language. But if you are a native speaker of both Mandarin and Taiwanese, it isn't too hard to learn the written form if you want to.

I'm not a native speaker of Taiwanese, but I can make somewhat intelligible translations of them. A native speaker who has learned to write Taiwanese and knows English could do a better job.

> GPS in my neighborhood has a 100x lower accuracy because of radio interference

Tī goán chhù-piⁿ in-ūi tiān-chû kau-jiáu só͘-í GPS ê cheng-chún-tō͘ khah pháiⁿ chi̍t-pah pē.

> move this MOSFET up by 15mm to balance the PCB thermal stress

Chit MOSFET ūi-tio̍h chè-ap PCB ê on-tō͘ ài khǹg kòe-khì siōng-chē 15mm.

> Even Japanese, a language used by 125 million, has similar issues, my Japanese coworkers frequently switch to English during technical discussions.

That's interesting. I have only experienced preference for English technical vocabulary, but never switching of languages. Even for native English speakers they need to have familiarity with the subject or those technical words are unintelligible to them.


> Taiwan people are highly educated and urbanized. It's much harder to use Taiwanese in Taiwan compared to High German in a Pennsylvanian Amish village.

This isn't true at all. Taiwan is highly educated and urbanized and nearly everyone can understand Taiwanese. Millions of people speak it natively and everyone else has encountered it in media, day-to-day life and also in school in the past 15 or so years.


>Even Japanese, a language used by 125 million, has similar issues, my Japanese coworkers frequently switch to English during technical discussions.

It's a chicken and egg problem. If you don't do technical discussions on Japanese now then you won't do them in the future either. You have to consciously start doing that and then eventually you'll do it that way as a preference.

>why not just use the popular languages?

Because you will then take over their cultural baggage. Look at English and the internet. Americans are outnumbered and yet it's expected that people follow the norms of American culture online.

If you speak German on the same websites then those norms more or less disappear.


> You have to consciously start doing that and then eventually you'll do it that way as a preference.

For Japanese, it's opposite. People aren't good at English so Japanese tech writing/talk is everyone's preference. Also Katakana helps a lot to import foreign words. Top people learn English, import words, wrote their text, and talked in Japanese.

Now Japanese texts are getting not popular (due to not profitable) and many English texts are very easily accessible thanks to internet and it's lingua franca. So finally many Japanese are going to learn English to catch up.


Latin is still very much in use around the world for religious purposes


The better translation technology is the less likely it is I’ll ever learn a language.

To put it another way, in 10 years when I visit China I could probably have an in-depth conversation without ever knowing Chinese.

I’d the Chinese can come to a university and never need to learn English, they won’t. Your parents and community will teach out to speak and that’ll be the end of it.

It’ll be closer to the Tower of Babel scenario. Every community will have their own language and dialect and the AI will just adapt. If a solar flare takes out electronics we won’t be able to understand each other.

Alternatively, the AI will teach us to speak or an implant will teach us similar to the matrix.

Those are the scenarios in the next 25-30 years I see.


> 10 years when I visit China I could probably have an in-depth conversation without ever knowing Chinese

Doubt it. When you learn a language you learn a lot more than the translation, you learn thinking and connotations that require a broader understanding that never get captured regardless of the translation quality, because they require knowing history and seeing previous uses in context that can't be captured in the translation.

"Good" translation will do for language what wikipedia has done for knowledge. Everyone will get a superficial understanding, some will pretend that reading off a definitions is the same as actually knowing something, and the world will get a lot more shallow, empty conversations.


All of that does not matter for just communicating, even for the languages where google translate is terrible, it's already equivalent to a B2 level which is enough for communicating about everything.

In some languages, DeepL has near perfect translation to the point you cannot really tell if it was translated.


The Tower of Babel scenario isn't so bad is it? Instead of minority language speakers being under immense pressure to adapt to and be replaced by dominant languages, communities can continue to exist as they are in their own language.


> To put it another way, in 10 years when I visit China I could probably have an in-depth conversation without ever knowing Chinese.

preposterous. having an indepth discussion with a foreigner requires not only the language but also the understanding of the cultural context.


I don’t need cultural context to order a cheese burger. Presumably, future AI systems would be able to translate this context as well. They do to some extent [poorly] today.


You moved the goalpost from "in-depth conversation" to "order a cheese burger".


Hmm, you can really see how biased the datasets are towards certain groups.

For example, type in "Filipino" and you get NSFW photos.

Filipino is what we call citizens of the Philippines, btw.


Are there techniques for contrastive learning that's also applicable to tabular data?


I don't see any available openings for Japan on the greenhouse page, too.


https://jobs.lever.co/woven-planet?

Currently we are using 2 ATS's - this will be consolidated at some point in the near future. Level 5 is all on Greenhouse.


Interesting but how is this as an alternative to Apache Flink's stream processing model?


A big difference is the removal of windowing: Flink lets you aggregate or join events only so long as they arrive in the same temporally-bound window. You're required to have a window, and it's core to the semantics of your workflow.

Flow's model doesn't use windows, and allows for long-distance (in time) joins and aggregations. There's no concept of "late" data in Flow: it just keeps on updating the desired aggregate.


You do not need windowing. You can do everything you want with regular KeyedStream. You can join not-windowed streams using IntervalJoin.

This is if you want to use high level API. If you use lower-level ProcessFunction you have even more flexibility.


dataflow things like Flink (or even better differential datatflow [0]) are far more flexible and subsume map-reduce. This article feels like hyping up the durability of the Model T.

[0] https://github.com/TimelyDataflow/differential-dataflow


IANAE on Flink, especially when it comes to the internals. But I think that the decomposition of computations into distinct map and reduce functions seems to afford a bit more flexibility, since it can be useful to apply reductions separately from map functions, and vice versa. For example, you could roll up updates to entities over time just with a reduce function, and you could easily do so eagerly (when the data is ingested) or lazily (when the data is materialized into an external system). That type of flexibility is important when you want a realtime data platform that needs to serve a broad range of use cases.


Just read from the site FAQs that it does support SSH.


This is a real problem for my country (Philippines), especially for provinces and agricultural companies that grow and sell Cavendish bananas. Banana production and exports were lessened by at least 20% just last 2015 and it's largely attributed to the Panama Wilt. Sadly, no solutions to cure the disease exist right now even with years of multinational research.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: