I'm no expert on character encodings or Unicode itself, but would this be as simple as checking for the byte 1F in the data? Assuming the file is ASCII or UTF-8 encoded (or attempting to confirm this as much as possible as well), it seems like that check would suffice to validate the absence of the code point in the data, but I imagine it's not quite so simple.
For text data, it would work fine, but you'd have to do some finagling with binary data; $1F is a perfectly valid byte to have in, say, a 4-byte integer.
My going assumption is that arbitrary binary data should be in a binary format.
Feel free to correct me, but I figure that as long as data can be from 0x00 to 0xFF per byte, no format that uses characters in that range will ever be safe. Iām not a big C developer but I figure the null terminated strings have the same limitation.
But if its something entered by keyboard you should be ok to use control codes.
Personally, I find tab and return to be fine for text driven stuff. Shows up in an editor just like intented.