Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A filesystem accepting only NFD should be filed as bug. They can normalize it internally to NFD, as Apples previous HFS+ did.

But even worse than that is Python's NFKC, which normalizes ℌ to H and so on. The recommended normalizations are NFC for offline normalization (like in compiled languages and databases) and NFD for online, where speed trump's space. unicode.org talking that much about NFKC was a big mistake. NFKC is crazy and doesn't even roundtrip. The whole TR31 XID_Start/Continue sets are mostly because of NFKC issues, not so about stability. But people bought it for its stability argument.

I'm just writing a library and linter for such issues: https://github.com/rurban/libu8ident

Also note that C++23 will most likely enforce NFC identifiers only. Same problem as with this filesystem. My implementation was to accept all normal. forms and store it internally and in the object files as NFC. The C ABI should declare it also. Currently they don't care as much as Linux filesystems: Nada. Identifiers being unidentifiable



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: