Reminds me of the good old days with EUC-KR, KSC 5601, and all those different e... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		yongjik on Dec 31, 2021 \| parent \| context \| favorite \| on: Unicode Normalization Forms: When ö ≠ ö Reminds me of the good old days with EUC-KR, KSC 5601, and all those different encoding schemes I've successfully repressed in my memory for years. Yes, you could probably assert that a piece of string was either Korean or English but never anything else... because the system was incapable of representing it. I'm not exactly sure how a code page is supposed to help us here. Developers have trouble supporting multiple languages when they're all in the Unicode Standard. Supporting code pages for languages they've never heard of? Not a chance.

numpad0 on Jan 1, 2022 [–]

I'd guess a standardized codepage marker like a "start of CP[932]” ｉｓｇｏｉｎｇｔｏｂｅｎｅｃｅｓｓａｒｙ CP[1252] at each CP switches but it might be just a necessity. Han unification is a well known problem to Far Eastern but Unicode normalization problem is basically the same as that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact