I'm not sure why you think the web is less parseable now. HTML5 is well describe...

wegs · on June 22, 2020

Your timeline is completely confused. ASPX came out in 2002, after HTML4. HTML2 was the era when CGI scripts were dominant.

Next, HTML2+SGML were also well-designed, and people weren't just doing regexp. The mess didn't come in until HTML3 and even more so, HTML4.

Today, it's easy to parse /specific/ pages. If I want to automate one web page, and it's well-formed, HTML5+AJAX makes that easy.

However, in contrast to HTML2, it's very hard to parse pages /generically/. That's why I gave the example of Altavista and a11y tools, which need to work with any web site.

Try to make something like that today: a generic search spider, a web browser, or an a11y tool. See how far you get talking to JSON endpoints. They're often easy enough to reverse-engineer for a specific web site, but you need a human-in-the-loop for each web site. With HTML2, one would build tools which could work with /any/ web site.

And boy were there a lot of tools. Look at all the web browsers of the nineties, and the innovation there.

bonestormii_ · on June 22, 2020

Oooo well said. One of the first programs I ever wrote was a scraper for such an ASPX site. Parsing state ids and reposting them over and over again... what a joy it was.

wegs · on June 22, 2020

Much as an argument confusing World War II Germany with the Holy Roman Empire might be called 'well-said.' You're confusing the early web era with the dot-com boost/bust period.

The early web were the era of dozens, perhaps hundreds of competing web browsers which were made possible by simple, well-engineered web standards. Pages were served statically, or with CGI scripts. You had a whole swarm of generic spiders, crawlers, and bots which automated things on the web for you. Anyone could write a web browser, so many people did.

The dot-com boom/bust had companies doubling in size every few months, people who could barely code HTML making 6-figure salaries, Netscape imploding, early JavaScript (which, at the time, looked like a high schooler's attempt at a programming language), and web standards with every conceivable ill-thought-out idea grafted in.

If one of the first programs you ever wrote was a scraper for an ASPX site, you never saw the elegance of the early days. ASPX came out not just after HTML3, but after HTML4.

earthboundkid · on June 23, 2020

If you define early web as pre-1998, then you’re essentially talking about five guys who all had computer science backgrounds. Yes, they were good at their jobs, but it was never going to last. Increasing the number of web developers by 1000x by definition had to drag down their average skill level to the average skill level of the population at large.

Most definitions of the early web include the PHP Cambrian explosion because essentially all websites today got their start then and only a few horseshoe crabs sites (mostly the homepages for CS profs!) predating it survive. Gopher sites were also probably really easy to scrape too. ;-)

wegs · on June 23, 2020

It was before your time, kid. (1) I think you underestimate the early web by quite a bit. It had a lot more awesome than you give it credit for, and if not for dot-com bubble + bust, it would have evolved in a much more thoughtful way (2) And dot-com boom and growing developers 1000x didn't need to involve Netscape, Microsoft/IE, or the W3C implosions of the time. Those were a question of management decisions and personalities.

But my original comment was 100% unambiguous: "I liked HTML2. I hated basically everything which went into HTML3 and HTML4."

Y'all responded by citing bad examples from the HTML3 / HTML4 era as examples of things going wrong...

---

Note: Before I get jumped on for "kid," it's the username.

earthboundkid · on June 24, 2020

Fair enough. I actually was a kid in 1998. I believe I started “programming” HTML in 1997 or so (copying view source and uploading to my internet host). There were some cool things like Hot Wired and Suck.com (and the bus on the MSN splash page!), but it was just a vastly smaller space than now. Even Geocities doesn’t really make your cutoff, so it’s hard to compare.

earthboundkid · on June 23, 2020

Good news, ASPX is alive and well in the United States governments many state backends 🇺🇸 🇺🇸