Afaik UTF-32 is relatively convenient as it is fixed-width, and thus many operat...

ruediger · on Feb 6, 2012

You still have to worry about combining characters.

ars · on Feb 7, 2012

Exactly. One UTF-32 character (code-point) is not one displayed character (glyph), and one UTF-32 character is not one character to search.

So basically it's not fixed width in any meaningful way.

adobriyan · on Feb 7, 2012

I largely share sqrt17's grief.

There is no such thing as "UTF-32 character".

Abstract character is not code point.

UTF-32 _is_ fixed width because it's defined on code points, not glyphs, not characters.

ars · on Feb 7, 2012

Fixed width in what way? It's not fixed width for display or search, so what's fixed about it?

rwmj · on Feb 6, 2012

You might want to use UCS-4 as an internal format, or some other convenient format (for example ASCII if you know everything is 7/8 bit). But UTF-8 should still be the only thing seen externally.