This is a common recurring theme that I find strange. Why do people believe that if something can be browsed with less/Notepad, it is somehow more "readable" than a binary representation? I've seen and worked with tons of XML, and most of it was utterly useless without accompanying documentation, just as a binary format would be. The fact that I could "see" the XML on my screen did not make the job easier.
The problem with formats "readable without any software" is that you usually end up with everything represented as strings, with no knowledge of actual types and constraints. So, you still need that documentation, even though the format is "readable without any software".
It's not about human readability, it's about separating implementation from representation, so "readable without any software" is a good test for wether it's "readable with any sfotware".
If your document editor represents documents as a stack of edit-events (for undo/redo or whatever) and you document every nook and cranny of this "document format", standardize it as "binary event-sourced document format, .esd" then your binary spec will be very close to a full description of your document editor. Everyone who wants to use your format now has to reimplement your editor.
Anyone can leak implementation details into any format, and it is to some extent unavoidable, but if you can make it readable (or semi-readable), some thought has at least been put into the representation.
>It's not about human readability, it's about separating implementation from representation
No, it's not. An sqlite-based document can have as many representations as an XML-based document -- nothing in its implementation prevents this.
>so "readable without any software" is a good test for wether it's "readable with any sfotware"
The OS, filesystem, shell, editor etc needed to read the XML file are still software.
As is the XML parser needed to do anything useful with it in the data realm.
>If your document editor represents documents as a stack of edit-events (for undo/redo or whatever) and you document every nook and cranny of this "document format", standardize it as "binary event-sourced document format, .esd" then your binary spec will be very close to a full description of your document editor. Everyone who wants to use your format now has to reimplement your editor.
You do understand that the XML file can also just be including a stack of edit-events, right?
This is totally orthogonal to the underlying storage (XML or sqlite or whatever).
> so "readable without any software" is a good test for wether it's "readable with any sfotware"
What an extraordinarily stupid notion. Your filesystem is not readable without a filesystem driver for the specific binary format used to represent it, so you cannot even read your text file without having something in your stack that understands a more structured data organization scheme.
> Your filesystem is not readable without a filesystem driver for the specific binary format used to represent it, so you cannot even read your text file without having something in your stack that understands a more structured data organization scheme.
This argument doesn't hold water, since file content doesn't depend on which filesystem it's on.
An archiver may choose to store data in some very particular way, like some specific tape driver. They're free to do that without corrupting the content of any files, since the files don't depend on the filesystem.
On the other hand, if an archiver chose to store, say, unzipped versions of their opendocument files, then they've corrupted the data: opendocument files are zips, unzipped data is not opendocument. The format does depend on the file content.
Yes, but that comes with the operating system that comes with the computer i bought and use in the house which comes with electricity from the electrical grid in the city that lies in an aerated valley (luckily!) on the third planet from the sun. I should have added a disclaimer i guess.
> Why do people believe that if something can be browsed with less/Notepad, it is somehow more "readable" than a binary representation? I've seen and worked with tons of XML, and most of it was utterly useless without accompanying documentation, just as a binary format would be.
Binary formats start out mostly unreadable by humans [1]. XML and other textual formats at least have the possibility of being made to where they can be read by laypersons.
1] I'll note that after enough immersion, I've seen people read binary core dumps, etc. but that takes much more time and practice than with XML.
Some formats don't need to be read as plaintext by the average person.
Would the average person edit a .svg by hand? No, he'd use Adobe Illustrator or anything else.
Would he edit a .docx file by treating it as a zip archive and edit content.xml? No, he'd use MS Word/LibreOffice Writer...
Suppose Sqlite used XML/JSON instead of binary files, would you modify them with notepad? No, you'd probably use a SQLite browser software.(Or an application that is more tailored to the domain)
> Some formats don't need to be read as plaintext by the average person.
Fair point, and in fact, I'd go further - most of the time most people, do not need to directly edit or view most files in most formats. Even if you took 'most' to mean 99% or higher, I'd be comfortable with that statement.
Where I differ is that I think the essence of the argument is really whether or not a binary file format offers enough value to be worth entirely eliminating the direct edit/view possibility for everybody all the time. Even if it's not a common case, it can be game-changingly useful when you need it.
Just to illustrate, you give three examples, and I have counter examples for each:
> Would the average person edit a .svg by hand? No, he'd use Adobe Illustrator or anything else.
I've modified bounding boxes by directly editing SVG, as well as read SVG directly to analyze some plots a library was generating for me.
> Would he edit a .docx file by treating it as a zip archive and edit content.xml? No, he'd use MS Word/LibreOffice Writer...
I've done this recently to extract out embedded documents on OSX (where the native versions of Office do not directly support this.)
> Suppose Sqlite used XML/JSON instead of binary files, would you modify them with notepad?
HSQLDB can use text based SQL scripts to store data, and I've modified and edited them directly for several reasons.
Well, yes, this applies to reasonably well-designed, expressive XML (which OpenDocument is). You can definitely write XML that is perfectly incomprehensible without extensive references.
XML is potentially readable, but it's always possible to make the semantics so twisted it's as bad as a binary encoding. A binary encoding is obfuscated by nature.
I despise xml quite a bit but I have to admit, it's still potentially better.
The problem with formats "readable without any software" is that you usually end up with everything represented as strings, with no knowledge of actual types and constraints. So, you still need that documentation, even though the format is "readable without any software".