Thanks for the correction. I don't know where I originally got the "up top six bytes" number from, but I've been using it for a while. Apparently, it's out of date (looking at the original proposal, https://en.wikipedia.org/wiki/UTF-8#Description , some code points were expected to need 6 bytes, but as you say, that's no longer true).