Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

counterpoint:

A complicated program is never an easy win, and English is already spoken in every country in the world.



Sure spoken, but both Arabic and CJK ideograms are written in far more countries in the world, with far more people, and for far longer in history than the ASCII set. The oldest surviving great works of Mathematics were written in Arabic and some of the oldest surviving great works of Poetry where written in Chinese, as just two easy and obvious examples of things worth preserving in "plain text".


So your argument is... it's easier to teach billions of people fluent English... than for software to support UTF-8?

You are aware that a majority of the world's population speaks no English whatsoever?


Playing the devil's advocate here. I am not a native English speaker, I'm a French speaker, but I'm happy that English is kind of the default international language. It's a relatively simple language. I actually make less grammar mistakes in English than I do in my native language. I suppose it's probably not a politically correct thing to say, the English are the colonists, the invaders, the oppressors, but eh, maybe it's also kind of a nice thing for world peace, if there is one relatively simple language that's accessible to everyone?

Go ahead and make nice libraries that support Unicode effectively, but I think it's fair game, for a small software development shop (or a one-person programming project), to support ASCII only for some basic software projects. Things are of course different when you're talking about governments providing essential services, etc.


English isn't even ASCII anyway.

Some loanwords like façade or café retain their accents.

Units like ° £ € and symbols like © ® × ÷ ½ aren't ASCII.

It doesn't take much to need one of these cases in a project.


I know almost no one who actually types the accented e, let alone the c with the cedilla. I scarcely ever see the degree symbol typed. Rather, I see facade, cafe, and "degrees".

That aside, the big problem with unicode is not those characters; they're a simple two-byte extension. They obey the simple bijective mapping of binary character <-> character on screen. Unicode doesn't. You have to deal with multiple code points representing one on-screen grapheme, which in turn may or may not translate into a single on-screen glyph. Also bi-directional text, or even vertical text (see the recent post about Mongolian script). Unicode is still probably one of the better solutions possible, but there's a reason you don't see it everywhere: it means not just updating to wide chars but having to deal with a text shaper, re-do your interfaces, and tons of other messy stuff. It's very easy for most people to look at that and ask why they'd bother if only a tiny percentage of users use, say, vertical text.


The first point is just because of the keys on a keyboard.

I see many uses of "pounds" or "GBP" on HN. Anyone with the symbol on the keyboard (British and Irish obviously, plus several other European countries) types £. When people use a phone keyboard, and a long-press or symbol view shows $, £ and €, they can choose £.

Danish people use ½ and § (and £). These keys are labelled on the standard Danish Windows keyboard.

There's plenty of scope for implementing enough Unicode to support most Latin-like languages without going as far as supporting vertical or RTL text.


For some reason people seem to think that the only options are UTF-8 and ASCII. That choice never existed. There are thousands upon thousands of character encodings in use. Before Unicode every single writing system had its own character encoding that is incompatible with everything else.


You didn't say spoken by every person. Merely spoken in every country. Even the existence of tourists in a country would pass this incredibly low bar...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: