Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But that is ASCII. That part of Unicode is a literal copy of ASCII. In any case, just putting an "if out of range, return 0" clause wouldn't hurt performance noticeably given all the indirection already present. If used in a loop, the CPU will predict that branch perfectly every time if your data is correct. There is no reason to just crash.


ASCII goes to 7F, not to FF; it is a 7 bit character code.

Therefore, for instance, isspace(0xA0) might usefully report true if we are in a Unicode locale, otherwise not.

The 0x80-0xFF values are also used in 8 bit extensions over ASCII, like ISO-8859 1 and ISO 8859-15 character sets. E.g. 0xE0 is à in ISO-8859 1 (which is, of course, the same as the Unicode U+00E0 but logically distinct).

A totally different 8 bit extension is KOI-8.

The point is valid that if you don't support any "weird" extensions to ASCII (just ISO Latin) or non-ASCII 8 bit, then there isn't much of a need for run-time table indirection. The cases that may arise can be handled ad hoc. Along the lines of "if we are in an ASCII locale, then report false above 7F, otherwise go through the combined Latin/Unicode combined table".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: