There are so many layers of abstraction in the Windows API
that it’s a miracle that anyone could maintain it - which probably
explains the increasing level of bloat in new versions of Windows.
Are there fewer layers of abstraction in other popular OSEs?
Yes – it's a product of the way Windows was actually two operating systems, the Windows 3/95/98/ME lineage and the newer OS/2 / NT lineage.
The NT side had the advantage of being designed at all rather than growing organically by overworked developers hacking in whatever they needed right now, and had a number of assumptions (e.g. not starting with the 16-bit API real-mode model) which avoided some gnarly hacks.
The problem was compatibility: most of the apps had been developed on 16-bit Windows 3 or, later, Win95 and at the time Microsoft's dominance was far from a given so they were pathologically afraid of breaking compatibility, which meant that the Windows “platform” included a lot of weird semi-or-undocumented corners designed to avoid breaking specific apps and the NT side had to reimplement bugwards-compatible versions of most of them to avoid breaking shipped apps.
(This might seem excessive – and I would generally agree – but it's important to remember that much of the damage was done in the pre-internet era when shipping updates to software meant putting a box in the main with a pile of floppies or, for the really rich people, CDs. Getting someone to upgrade to a version of an app which didn't rely on an implementation quirk could take many years.)
Raymond Chen has written about this extensively at http://blogs.msdn.com/b/oldnewthing/ and one of my favorite examples is the Shell Folders registry key:
The closest you come to this on another mainstream OS is OS X, where they maintained Carbon (i.e. the supported subset of the classic Mac OS APIs) on top of the modern core but that was both more limited and was rapidly deprecated because Apple is far less concerned with breaking backwards compatibility than Microsoft used to be.
Given that Microsoft are going to "continuous deployment" with Win10, I think they ought to have taken the opportunity to do one last breaking change, possibly behind an install-time option or compatibility layer: ditch the CP/M legacy.
That is, change the path separator to '/', ditch the drive letters, and the magic file names like CON, and the 8.3 names.
I believe that the Windows API supports '/' as a path separator and that it's user mode apps that tend not to. I can't find a good citation for that, though.
Real bummer that you have to specify the drive letter. (At least on Win7, where I just tried it.)
The behavior is kind of interesting for 'dir' at least:
E:>dir "/"
Directory of \\
File Not Found
E:>dir \\
The filename, directory name, or volume label syntax is incorrect.
Same result for «dir "//"».
(For those not experienced with the Windows command prompt, seeing "Directory of \\" is unexpected, and seeing it means that something interesting must be going on.)
I vaguely remember something about MS coders already using / to delineate arguments and \ for file paths; when they realized UNIX used \ for that function and they'd have to interoperate over a network, it was already too late.
MS-DOS 1.0 (which did not support directories at all) was already
using / to introduce command-line options. It took this usage
of / from CP/M, which took it from VMS.
“File I/O functions in the Windows API convert "/" to "\" as part of converting the name to an NT-style name, except when using the "\\?\" prefix as detailed in the following sections.”
I was thinking of the special device names, unicode and length handling depending on whether you use \\?\ – that's the best way to write safe code except when it isn't:
“Many but not all file I/O APIs support "\\?\"; you should look at the reference topic for each API to be sure.”
I'd expect that this will happen for Metro apps first – those are all modern and since most of them are supposed to run on e.g. ARM anyway legacy code compatibility matters considerably less.
I'd be surprised if they changed the path separator but the device filenames and FAT-compatible short names seem worth doing for security alone.
Thanks. I think that some backwards compatibility can be handled by more code and not necessarily more layers of abstraction, but I get your point.
It seems like there are lots of layers in Linux distributions because of the Unix philosophy. For example, to put a window on a screen, aren't the layers something like this:
No, the window manager isn't in that chain. You can kill the window manager and your applications can keep running and drawing to the screen, but you can't move or resize windows. Also, instead of "Desktop environment" you mean GUI widget toolkit.
Gnome or KDE is just a package including a window manager and a bunch of mostly-standalone programs that give you a toolbar, file browser, settings panels, etc. It's not a layer that anything goes through. You don't even have to use the window manager it comes with.
> By desktop environment I meant e.g. Gnome or KDE. Aren't they separate from the widget toolkits?
I was being charitable. The desktop environment doesn't belong in the graph you drew. The GUI widgets belong in the place where you wrote "desktop environment". The charitable assumption is that you reasonably conflated the GUI widget toolkits for desktop environments, since the very popular KDE and Gnome are strongly tied to Qt and gtk, respectively.
Audio with ALSA was very simple, but didn't support hot-plugging. So rather than improve ALSA, PulseAudio was created. When you add legacy OSS to the mix, which was deprecated probably 10+years ago, yeah, it's complicated. It didn't have to be, though, if the good parts of PulseAudio were implemented as ALSA modules instead.
Really interesting article, sounds like a fun adventure. Couple of points:
1.) "0x12b9b0a5. This equals 314159269 in decimal. Yep, that’s the first 8 digits of pi right there" -- actually, it's not. The first 8 digits of pi are 3.14159265, it's not clear why the Microsoft developers ended with a 9 instead of a 5, perhaps a mistake? perhaps to help with coprime-ness?
2.) The "ANSI C" version uses mbstowcs which isn't any kind of ANSI C function I've ever heard of.
Fantastic article, really enjoyed reading, many thanks.
Those are the first 9 digits if you count 3 - the 9 at the end is the 9th digit, not the 8th, so the first 8 digits are correct. You're correct about the ANSI C part though, it's actually C99 - I've corrected that part. Thanks.
It claims to be "standard C", which usually these days means ISO C rather than ANSI C. For details about mbstowcs, see the C99 standard, section 7.20.8.1.
> I’d need to write a device driver to call it, which is both something I’ve never touched at all before, and something that’s nigh impossible without access to the DDK or NDK.
The routine itself would be copyrighted, but someone should be able to make a clean room reimplementation of the original algorithm, assuming its not protected by patents. (IANAL)
Which, as far as I understand it, could be someone posting a human-language description of the algorithm here (based on the blog post,) then ReactOS writing code based on that description. (IANAL.)
A few years ago I implemented the storage system for a special-purpose diagnostic camera. The specification defined (very long) filenames for the saved images using a timestamp and some other data. I used a mostly off-the-shelf microcontroller/NAND/USB mass storage reference implementation, hooked it up a side-channel to the FPGA, and had everything working pretty nicely. Until the test harness that just continuously commanded pictures to be taken reached 105 iterations. After that, the camera timed out waiting on the storage subsystem to store the image.
The problem turned out to be the code that found the 8.3 filename: it did the longfi~1.bin, checked to see if that file existed and if so, incremented to longfi~2.bin, then checked that... but never did the checksum trick described here, just kept iterating. (bear in mind this was a tiny 8-bit microcontroller that didn't have the RAM to just read all the directory entries at once and keep them around for comparison) Finding the proper 8.3 filename this way took longer than the timeout period after 104 collisions.
Of course, we only cared about the long filename and never saw the 8.3 filename, so my fix was simply to use an appropriate hash of the long filename to ensure a good probability of uniqueness.
If the checksums collide, then the number after the tilde is incremented again, (eg. SOBC84~2.ASP). This time, it won't stop at ~4, so you can go up to ~10 and beyond. The file name will be shortened accordingly to fit the number in (eg. SOBC8~10.ASP). This was tested on Windows 7 x64.
And now the last question, how far can you take that logic? What happens at 1,000,000 (one million file names generated) when there are no more characters left to remove from the left side?
A directory is stored as a linked list of clusters like regular files, and the 32-bit filesize field is irrelevant for a directory, so it theoretically could be as big as the whole volume - just keep adding clusters to the chain.
Filesystem drivers may give up long before then, and access to such a huge directory would be very slow, but there's nothing in the filesystem structures itself that would prevent it. I've written FAT code for an embedded device that I can confidently say would have no problems with large directories, since it'll just keep following the cluster chain to the end.
Actually, it doesn't look like it. It might need to be a race condition done with a program (delete the old files to keep it going to a million). Might be tricky.