Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Computers have been more or less universal since the time RAM was used to execute the programs, the fact that files are a convenient way to organize data does noes not detract from the fact that the internal structure of a file needs to be known if you want another program to make sense out of it.

> In that sense the text file is the most enabling element here, and binary files with unknown structure the least.

text files haven't always been universal. There used to be different text encodings (ASCII hasn't always been universal) and even different sizes of byte (eg some systems would have 6 or 7 bits to a byte). And even if two machines you're wanting to share data between were both an 8bit ASCII system; there's no guarantee that they would share the same dialect of BASIC, LISP, Pascal or C.



ASCII isn't universal even today (the page you are reading is UNICODE), but all text was much more readable and in general easier to process with ad-hoc filtering or other manipulation than binary formats.

This is what underpins most of the power of UNIX, but at the same time is something of a weak point: good and error free text processing is actually quite hard.


> ASCII isn't universal even today (the page you are reading is UNICODE)

That's not really a fair point because Unicode and nearly all of the other extended character sets (if not all of them) still follow ASCII for the lower ranges. This also includes Windows Code Pages, ISO-8859 (which itself contains more than a dozen different character sets) and all of the Unicode character sets too.

> but all text was much more readable and in general easier to process with ad-hoc filtering or other manipulation than binary formats.

Text is still a binary format. If you have a different byte size or significantly different enough base character set then you will still end up with gibberish. This is something we've come to take for granted in the "modern" era of ASCII but back in the day "copying text files" between incompatible systems would produce more garbage than just a few badly rendered characters or the carriage return issues you get when switching between UNIX and Windows.


So, essentially you are trying to make the point that even ASCII has its problems and that all data has to be encoded somehow before processing can be done on it. The latter seems to be self-evident and UNICODE is a response to the former.


That's not what I'm saying at all. ASCII is just a standard for how text is encoded into binary. However it hasn't been around forever and before it there were lots of different -incompatible- standards. It was so bad that some computers would have their own proprietary character set. Some computers also didn't even have 8 bits to a byte and since a byte was the unit for each character (albeit ASCII is technically 7bit but lets not get into that here), it meant systems with 6 or 7 bits to a byte would be massively incompatible as you're off by one or two bits on each character, which multiplies up with each subsequent character. This meant that text files were often fundamentally incompatible across different systems (I'm not talking like weird character rendering; I'm talking about the file looking like random binary noise).

ASCII changed a lot of that and did so with what was, in my opinion at least, a beautiful piece of design.

I did a quick bit of googling around to find a supporting article about the history of ASCII: http://ascii-world.wikidot.com/history

There was also some good discussions on HN a little while ago about the design of ASCII, I think this was one of the referenced articles: https://garbagecollected.org/2017/01/31/four-column-ascii/

The history of ASCII is really quite interesting if you're into old hardware.


ASCII wasn't nearly as universal as you think it was.

And a byte never meant 8 bits, that's an octet.

ASCII definitely was - and is - a very useful standard, but it does not have the place in history that you assign to it.

In the world of micro-computing it was generally the standard (UNIX, the PC and the various minicomputers also really helped). But its limitations were apparent very soon after its introduction and almost every manufacturer had their own uses for higher order and some control characters.

Systems with 6 or 7 bits to a byte would not be 'massively incompatible' they functioned quite well with their own software and data encodings. That those were non-standard didn't matter much until you tried to import data from another computer or export data to another computer made by a different manufacturer.

Initially, manufacturers would use this as a kind of lock-in mechanism, but eventually they realized standardization was useful.

Even today such lock-in is still very much present in the word of text processing, in spite of all the attempts at getting the characters to be portable across programs running on the same system and between various systems formatting and special characters are easily lost in translation if you're not extra careful.

Ironically, the only thing you can - even today - rely on is ASCII7.

Finally we've reached the point where we can drop ASCII with all its warts and move to UNICODE, as much as ASCII was a 'beautiful piece of design' it was also very much English centric to the exclusion of much of the rest of the world (a neat reflection of both the balance of power and the location of the vast majority of computing infrastructure for a long time). If you lived in a non-English speaking country ASCII was something you had to work with, but probably not something that you thought of as elegant or beautiful.


With the greatest of respect, you don't seem to be taking much attention to the points I'm trying to raise. I don't know if that is down to a language barrier, myself explaining things poorly, or if you're just out to argue for the hell of it. But I'll bite...

> ASCII wasn't nearly as universal as you think it was.

I didn't say it was universal. It is now, obviously, but I was talking about _before_ it was even commonplace.

> And a byte never meant 8 bits, that's an octet.

I know. I was the one who raised the point about the differing sizes of byte. ;)

> ASCII definitely was - and is - a very useful standard, but it does not have the place in history that you assign to it.

On that we'll have to agree to disagree. But from what I do remember of early computing systems, it was a bitch working with systems that weren't ASCII compatible. So I'm immensely grateful regardless of it's place in history. However your experience might differ.

> In the world of micro-computing it was generally the standard (UNIX, the PC and the various minicomputers also really helped). But its limitations were apparent very soon after its introduction and almost every manufacturer had their own uses for higher order and some control characters.

Indeed but most of them were still ASCII compatible. Without ASCII there wouldn't have even been a compatible way to share text.

> Systems with 6 or 7 bits to a byte would not be 'massively incompatible' they functioned quite well with their own software and data encodings. That those were non-standard didn't matter much until you tried to import data from another computer or export data to another computer made by a different manufacturer.

That's oxymoronic. You literally just argued that differing bits wouldn't make systems incompatible with each other because they work fine on their own systems, they just wouldn't be compatible with other systems. The latter is literally the definition of "incompatible".

> Initially, manufacturers would use this as a kind of lock-in mechanism, but eventually they realized standardization was useful.

It wasn't really much to do with lock-in mechanisms - or at least not on the systems I used. It was just that the whole industry was pretty young so there was a lot of experimentation going on and different engineers with differing ideas about how to build hardware / write software. Plus the internet didn't even exist back then - not even ARPNET. So sharing data wasn't something that needed to happen commonly. From what I recall the biggest issues with character encodings back then were hardware related (eg teletypes) but the longevity of some of those computers is what lead to my exposure with them.

> Finally we've reached the point where we can drop ASCII with all its warts and move to UNICODE, as much as ASCII was a 'beautiful piece of design' it was also very much English centric to the exclusion of much of the rest of the world (a neat reflection of both the balance of power and the location of the vast majority of computing infrastructure for a long time). If you lived in a non-English speaking country ASCII was something you had to work with, but probably not something that you thought of as elegant or beautiful.

I use Unicode exclusively these days. With Unicode you have the best of both worlds - ASCII support for interacting with any legacy systems (ASCII character codes are still used heavily on Linux by the way - since the terminal is just a pseudo-teletype) while having the extended characters for international support. Though I don't agree with all of the characters that have been added to Unicode, I do agree with your point that ASCII wasn't nearly enough to meet the needs for non-English users. Given the era though, it was still an impressive and much needed standard.

A side question: is there a reason you capitalise Unicode?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: