Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Afaik UTF-32 is relatively convenient as it is fixed-width, and thus many operations are faster to perform than in UTF-8.


You still have to worry about combining characters.


Exactly. One UTF-32 character (code-point) is not one displayed character (glyph), and one UTF-32 character is not one character to search.

So basically it's not fixed width in any meaningful way.


I largely share sqrt17's grief.

There is no such thing as "UTF-32 character".

Abstract character is not code point.

UTF-32 _is_ fixed width because it's defined on code points, not glyphs, not characters.


Fixed width in what way? It's not fixed width for display or search, so what's fixed about it?


You might want to use UCS-4 as an internal format, or some other convenient format (for example ASCII if you know everything is 7/8 bit). But UTF-8 should still be the only thing seen externally.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: