Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reminds me of the good old days with EUC-KR, KSC 5601, and all those different encoding schemes I've successfully repressed in my memory for years. Yes, you could probably assert that a piece of string was either Korean or English but never anything else... because the system was incapable of representing it.

I'm not exactly sure how a code page is supposed to help us here. Developers have trouble supporting multiple languages when they're all in the Unicode Standard. Supporting code pages for languages they've never heard of? Not a chance.



I'd guess a standardized codepage marker like a "start of CP[932]” is going to be necessary CP[1252] at each CP switches but it might be just a necessity. Han unification is a well known problem to Far Eastern but Unicode normalization problem is basically the same as that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: