Forking HTML into a static language doesn't make sense

recursivedoubts · on June 22, 2020

HTML doesn't need to be simplified to reign in the complexity.

Rather, it needs to be completed as a hypertext, so that people can build complete software systems using (only or mostly) that language.

See the htmx examples page for stuff that should be doable in plain HTML:

https://htmx.org/examples/

The fact that only anchor and forms, only clicks and submits, only GET and POST and only full page replacement is available in vanilla HTML is the core issue. If HTML had htmx-like functionality you would see far less pressure for large, complex front end frameworks, and it would be completely within the original REST-ful web model[1] that Roy Fielding described.

[1] - https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arc...

Grumbledour · on June 22, 2020

I think this is an interesting and important argument, because I always felt many shortcomings of the modern web were actually caused by browser vendors and their reluctance to improve their platforms and the web.

We have so many custom UI Elements, because forms did and still do suck. RSS always had a back seat, because it always was second class in browsers. Semantic web died outside of search engines, because browsers did not offer features like "Save event to my calendar". And boy, have I been promised for a long time I might someday be able to sort HTML tables by click.

Just a non exhaustive list of my pet peeves, but there are many more examples.

earthboundkid · on June 22, 2020

Forms suck, but if you actually make a form, it becomes clear that while you can do 90% of stuff declaratively, there's a leftover 10% that just has to be described imperatively, which requires something like JS. Modern forms actually give you a lot for free with pattern matching and whatnot. I could imagine it going a little further and adding properties for "dirty" or not, which right now you have to add using JS onblur listeners. Probably, the standards bodies could add in credit card field types and input masking to get things 99% of the way there. But there's still left over things like: your display name should default to being your personal name plus your family name, but also overridable. I don't see how you can create a declarative language for relationships like that without stumbling into a Turing tarpit.

Rotten194 · on June 23, 2020

The thing is, (at least personally I feel) JavaScript isn't the problem. The problem is because of the lack of tools people have leaned into doing everything with JS, instead of trying to manage the awkward 50-50 split we had before. If 95% could be done with HTML++, then a couple form customizations and a bit of business logic being done with 100 lines of JS would make sense and not be as horrible as the modern web.

fardelian · on June 22, 2020

You want "Save to my calendar" but we still don't have full support for date inputs. Chrome has supported the main features since 2012, Firefox joined only in 2017, and we're still waiting on Safari to make a move.

https://caniuse.com/#search=input-date

Grumbledour · on June 22, 2020

Well, I wanted "Save to my calendar" when our self-built date inputs still opened new windows with html tables as calendars to pick a date, but I get your point and I lumped this under "forms suck". I still shudder when thinking about implementing multi-file uploads in php.

ultrarunner · on June 22, 2020

I have an application that is partially broken in Safari only because of the lack of support for input type="datetime-local". Very frustrating.

recursivedoubts · on June 22, 2020

I find the projects that the W3C chooses to work on incomprehensible.

gsnedders · on June 22, 2020

The projects that SDOs decide to work on are those which people there are willing to spend their time on.

An SDO could say "we're doing project X", but if nobody shows up…

recursivedoubts · on June 22, 2020

Sure, but htmx is small (~1.7K loc[1]), written mostly by one guy part time and, while not perfect, advances HTML significantly as a hypermedia. That something like this hasn't been discussed in a significant, public manner by the w3c in over a decade[2] indicates to me there is either a lack of understanding or of interest in advancing HTML as a hypermedia.

I readily admit I may be missing something.

[1] - https://github.com/bigskysoftware/htmx/blob/master/src/htmx....

[2] - https://en.wikipedia.org/wiki/XHTML#XHTML_2.0

Touche · on June 22, 2020

Advancing HTML is indeed something that the w3c members aren't super interested in. There are a few reasons for this:

1) Changes to HTML can create security issues, especially if it results in a change to the HTML parser. You have to consider that non-browser use cases still need to be able to parse HTML correctly. Since we opted to make HTML versionless with <!doctype html> it became harder to do.

2) Big changes are harder to get consensus on. This is one of the reasons why they opt to do smaller, less controversial changes.

3) w3c and whatwg are dominated by large corporations, who hire thousands of engineers to work on their web properties. They are able to overcome the shortcomings of HTML easier than small teams and single-developer websites.

I agree though, this is a major problem. HTML is essentially a dead language, no major changes have been made in almost a decade.

WorldMaker · on June 22, 2020

4) The W3C HTML spec is now mostly just a placeholder that says "Whatever WHATWG thinks is best" and it is almost but not quite true that WHATWG's Living Spec is increasingly "Whatever works in Chromium/Chrome"

zokier · on June 22, 2020

Well XHTML 2.0 was such a roaring success that I'm sure they'll be super stoked to take on another project to improve HTML as hypertext without the baggage of browser backing and buyin.

WorldMaker · on June 22, 2020

Correct, the debate over XHTML 2.0 was exactly why the WHATWG was formed. It's indeed how the W3C lost most of its teeth in browser standards. XHTML 2.0 had some good ideas, but it shows how hard it is to govern standards from ivory towers.

dragonwriter · on June 22, 2020

What you seem to fail to recognize is that the reason nothing has been discussed (whether of the style you want or otherwise) at W3C regarding advancing HTML since XHTML2 is that the colossal failure of the “we’ll right a from-first-principles spec with no concern for pragmatics” approach with XHTML2 resulted in W3C being completely sidelined for HTML development. No one is listening to them, the browser vendors all work through WHATWG and that's where everything that matters happens.

W3C isn't interested in advancing HTML as anything and, even if they were, they'd be screaming into the night because none of the people who build browsers are interested in what they have to say. They’ve got their own club, and that's where HTML happens (and WHATWG and the browser vendors don't seem to be interested in pushing functionality into HTML in any big way, they are happy with the basic shape of the existing HTML/CS/JS trinity and don't see a big need to push for standalone HTML as a platform.)

sergeykish · on June 22, 2020

XHTML was not that bad. It was quite conservative - make serialization/deserialization idempotent, remove document.write, quirks mode, DOM Level 0 accessors.

The stupid part was advertising of writing XML by hand - both XHTML and XSLT. That is machine format - "application" in "application/xml". Not more sense than writing JS AST. And semantic web has no advantage for author.

XHTML 2.0 - XML Events is like IE <script for>, <img>alt</img> is good, <h> is good, removal of <i>, <b> - no one would see, XForms - partly adopted, and RDF - don't know.

Though it kind of understandable - XML was in rage. Everyone pushed it, Microsoft above all, IE5 accepted "text/xml" [1]. And Microsoft pushed XSLT without XSL-FO.

[1] https://www.xml.com/pub/a/1999/03/ie5/first-x.html

recursivedoubts · on June 22, 2020

thank you for clarifying

i knew there was some drama betweeen the two organziations but didn't know the history

Hello71 · on June 22, 2020

not to mention certificate authentication...

whereistimbo · on June 22, 2020

That reminds me of XHTML2 https://en.wikipedia.org/wiki/XHTML#XHTML_2.0

Grumbledour · on June 22, 2020

The Gophersphere seems to prosper and I have seen some interest in Project Gemini[0] over on Mastodon in the last weeks. So I guess People are trying and somewhat succeeding.

Personally, I am also quite skeptical of the benefits of these approaches. I would like a web more focused on documents, but many of the "forbidden" features these communities define are actually things I was excited about when they were introduced and still would not want to miss on the web. At the same time, just limiting the technology will not make it magically better.

I would love to see more experimentation towards more hypertext and the things it could offer to the user, but just using a subset of old technology seem not really like a great way forward.

At the same time, why can having a cleaner, more performant web not be a social movement amongst developers? I still remember when many of our websites features "Valid XHTML" or "Made with Web Standards" Badges, to show our peers and the world what was important to us. Maybe setting some rough guidelines what we envision a modern, performant and user respecting web to be and whipping up a few badges to show our colors could get as further then trying to reinvent the technology from scratch?

[0] https://gemini.circumlunar.space/

smacktoward · on June 22, 2020

Once upon a time (aka the early '90s), Gopher and WWW were both simple systems for publishing hypertext. Neither one was appreciably more complex or ambitious than the other.

WWW eventually pulled out into the lead, due largely to licensing considerations; WWW had been developed at CERN, which explicitly disclaimed any ownership over it, while Gopher had been developed at the University of Minnesota, which preferred to leave its ownership claims ambiguous. People gravitated towards WWW for the simple reason that they knew no one would sue them for using it.

As WWW's userbase grew, demand grew as well to add features to it. Mosaic added images and image maps; Netscape added JavaScript; and on and on and on. Eventually WWW grew to the ginormous, do-it-all system we know and (ahem) love today. Because Gopher had been left behind in adoption, it didn't have those pressures to extend its capabilities; it was free to remain the simple hypertext system it was in the early '90s.

I see a lot of people today point to Gopher as an example of what the Web should be. But this misses the point; the Web isn't what it is because of some design decision it made that Gopher did not, the Web is what it is because it had users. The more users there were, the more things people wanted to do with it; and the more things people wanted to do with it, the more features got tacked on.

If Gopher had been the one with the permissive licensing back in the '90s, it's very possible that it would have been the hypertext system everybody used, and today we'd be complaining about how complex Gopher has become and asking "hey, whatever happened to that toy Tim Berners-Lee was hacking on back in the day? Remember how simple that was?"

leephillips · on June 23, 2020

That’s not why the web won over gopher. This is why:

“Both Gopher and the Web embraced the idea of hypertext. Both allowed users to follow a conceptual path through a virtual space by following links, with little need to understand the structure that existed underneath. They differed considerably, however, in the information architecture that they established for laying out hyperlinked information. The main difference between the two is that the HyperText Transfer Protocol (HTTP) of the Web was built up around documents in HyperText Markup Language (HTML). This markup language allowed document creators to place links within documents, whereas Gopher provided pointers to documents through menu files, which existed separate from the documents themselves. The indexing component of the two information architectures -- i.e. the part that enumerated what items existed within the information space and where they could be found -- thus differed considerably.” From https://ils.unc.edu/callee/gopherpaper.htm

I was using Gopher and WAIS (nobody seems to remember that), both improved versions of FTP, when the web appeared. It immediately captured my imagination, and I and everyone around me forgot about the other protocols (except FTP, which hung on for ever). It had nothing to do with licensing. It was hypertext.

sergeykish · on June 22, 2020

Search engine that prioritizes pages with no JS and minimal CSS would help.

JS requires audit - even small script can game rank by encoding JS and CSS in other resources to pretend there is not much of it. Audit requires reproducible response, can be gamed, requires web of trust.

In the end it is all about goodwill, which may be hard to find in spyware ridden web.

wegs · on June 22, 2020

From my perspective, the reason to have a language for static documents is parsability. With HTML5+CSS+JS, there are few organizations at the scale which can sustain making something like a web browser, a general-purpose web crawler, a11y tools, or similar.

Could it be a common language? Perhaps. I liked HTML2. I hated basically everything which went into HTML3 and HTML4. HTML5 has made a lot of good progress on rolling this back, but we're not there yet. It's possible to write parsable HTML5, but most organizations don't do it, and the ones which do don't have common ways to do it.

It's hard to remember, but back in the HTML2 days, the web had a certain type of client-side programmatic experimentation which is absent today. Anyone could write a tool which would grab things from the web and do meaningful things with them, and many people did. Google evolved from Altavista, which was a tech demo at DEC.

To be clear: The web is a lot more programmatic today in a general sense (client-side browser extensions, AJAX APIs, embedding webkit, etc. etc. etc.), but one part of that flexibility is gone, and it was important.

earthboundkid · on June 22, 2020

I'm not sure why you think the web is less parseable now. HTML5 is well described and it's easy to get a compliant HTML5 parser for whatever language. Back in the day, people were just doing regex.

There's a separate issue that a lot of stuff requires JS, but the JS mostly just calls JSON endpoints, so that's easy to scrape. The tricky thing is scraping ASPX sites that jump through a bunch of hoops instead of having a simple backend API.

wegs · on June 22, 2020

Your timeline is completely confused. ASPX came out in 2002, after HTML4. HTML2 was the era when CGI scripts were dominant.

Next, HTML2+SGML were also well-designed, and people weren't just doing regexp. The mess didn't come in until HTML3 and even more so, HTML4.

Today, it's easy to parse /specific/ pages. If I want to automate one web page, and it's well-formed, HTML5+AJAX makes that easy.

However, in contrast to HTML2, it's very hard to parse pages /generically/. That's why I gave the example of Altavista and a11y tools, which need to work with any web site.

Try to make something like that today: a generic search spider, a web browser, or an a11y tool. See how far you get talking to JSON endpoints. They're often easy enough to reverse-engineer for a specific web site, but you need a human-in-the-loop for each web site. With HTML2, one would build tools which could work with /any/ web site.

And boy were there a lot of tools. Look at all the web browsers of the nineties, and the innovation there.

bonestormii_ · on June 22, 2020

Oooo well said. One of the first programs I ever wrote was a scraper for such an ASPX site. Parsing state ids and reposting them over and over again... what a joy it was.

wegs · on June 22, 2020

Much as an argument confusing World War II Germany with the Holy Roman Empire might be called 'well-said.' You're confusing the early web era with the dot-com boost/bust period.

The early web were the era of dozens, perhaps hundreds of competing web browsers which were made possible by simple, well-engineered web standards. Pages were served statically, or with CGI scripts. You had a whole swarm of generic spiders, crawlers, and bots which automated things on the web for you. Anyone could write a web browser, so many people did.

The dot-com boom/bust had companies doubling in size every few months, people who could barely code HTML making 6-figure salaries, Netscape imploding, early JavaScript (which, at the time, looked like a high schooler's attempt at a programming language), and web standards with every conceivable ill-thought-out idea grafted in.

If one of the first programs you ever wrote was a scraper for an ASPX site, you never saw the elegance of the early days. ASPX came out not just after HTML3, but after HTML4.

earthboundkid · on June 23, 2020

If you define early web as pre-1998, then you’re essentially talking about five guys who all had computer science backgrounds. Yes, they were good at their jobs, but it was never going to last. Increasing the number of web developers by 1000x by definition had to drag down their average skill level to the average skill level of the population at large.

Most definitions of the early web include the PHP Cambrian explosion because essentially all websites today got their start then and only a few horseshoe crabs sites (mostly the homepages for CS profs!) predating it survive. Gopher sites were also probably really easy to scrape too. ;-)

wegs · on June 23, 2020

It was before your time, kid. (1) I think you underestimate the early web by quite a bit. It had a lot more awesome than you give it credit for, and if not for dot-com bubble + bust, it would have evolved in a much more thoughtful way (2) And dot-com boom and growing developers 1000x didn't need to involve Netscape, Microsoft/IE, or the W3C implosions of the time. Those were a question of management decisions and personalities.

But my original comment was 100% unambiguous: "I liked HTML2. I hated basically everything which went into HTML3 and HTML4."

Y'all responded by citing bad examples from the HTML3 / HTML4 era as examples of things going wrong...

---

Note: Before I get jumped on for "kid," it's the username.

earthboundkid · on June 24, 2020

Fair enough. I actually was a kid in 1998. I believe I started “programming” HTML in 1997 or so (copying view source and uploading to my internet host). There were some cool things like Hot Wired and Suck.com (and the bus on the MSN splash page!), but it was just a vastly smaller space than now. Even Geocities doesn’t really make your cutoff, so it’s hard to compare.

earthboundkid · on June 23, 2020

Good news, ASPX is alive and well in the United States governments many state backends 🇺🇸 🇺🇸

whereistimbo · on June 22, 2020

XHTML2. You just need XML parser for that.

wegs · on June 22, 2020

Well, no. As much as I liked SGML back in the days for human-readability, having an SGML/XML/etc. parser is not the problem. The problem is understanding what the document is.

Today, much of the web consists of an empty DIV, populated with JavaScript from JSON objects. I can't do anything with that.

One step down, I have a structured, semantic document, but the semantics are defined as `class` elements. I can do more with that, but not a lot.

With HTML2.0, I couldn't keep much on the web, but it was semantic. I knew what body text was, what a header was, etc. Semantics are defined in the DTD, which is super-nice.

HTML5 points a path forward, with elements like `article` and what-not, but it's still got a ways to go before I can understand the content of a page in the way I could with HTML2.0.

There's a deep anti-pattern in there, wrapping things in wrappers, but that's a longer story.

frenchy · on June 22, 2020

That's vastily oversimplifying things. XHTML2 was a markedly superior language to HTML in a few ways, but you still would have needed to have the same CSS & JS components to interact with an XHTML page.

ezequiel-garzon · on June 22, 2020

I firmly believe truly plain HTML (sans even CSS) would be used more if browsers didn't seem to go out of their ways to make default pages ugly. Every element, down to H6, should be legible by default (even without the proprietary-turned-CSS viewport meta tag).

Sure, nothing is guaranteed about style, but that doesn't mean, for instance, that tables can't be styled by default and must instead look like a collection of lost strings on the screen.

I must admit I was pleasantly surprised by the bold implementations of DETAILS and SUMMARY, but still, there seems to be some sort of unspoken industry-wide manifesto against this.

sergeykish · on June 22, 2020

I imagine change in user.css would break existing pages. Individual user can make such decision but browser care about those who don't. Have you updated yours? It should start somewhere...

ezequiel-garzon · on June 22, 2020

I don’t see how making text legible and adding some style to tables would break existing unstyled pages.

With regard to your second point, I’m not talking about user style sheets, but about the prospect of writing unstyled HTML. If I ever get to post anything on my personal website I will walk the walk and use plain HTML, yes, but I doubt this will have any meaningful impact!

sergeykish · on June 22, 2020

It would make them look different, maybe conflicting with other applied styles. Browsers still supports quirks mode. It is only possible to change defaults if no content exists - XHTML has no quirks mode.

I'm trying to explore this direction, http://sergeykish.com/live-pages - just a few rules. Publish it, this page is from 2015, finally published. I believe the main obstacle is authoring and I want to resolve it. Write in markdown, run generator, apply themes, explore in devtools, host it - too much hassle, too professional.

We can write right on the page. PUT that page on server as it is. GET it back. Sync with static storage. Write own tools. That's simplicity, it is empowering.

ezequiel-garzon · on June 22, 2020

Hey, thanks for your replies, and interesting project! Correct me if I'm wrong, but quirks mode applies when the doctype declaration is missing, right? I'm talking about modern, validated HTML documents that have no CSS. And I'm sorry to repeat myself if you go over my comment history, but the whole thing about having to specify all of a sudden in a proprietary tag that I suggest to use the device(-)width as the width... well, is my ultimate pet peeve. Again, I'm talking about valid documents that make no use of CSS at all, and that failing to include that render the text font microscopically. That would be my suggested point of departure for the improvements, and generally speaking I feel the overall default aesthetics could be improved without breaking anything.

Here is a relevant example. The very first website ever made [1] renders like this [2] without a viewport meta style on a current iPhone, but like [3] when you include this tag which made its debut decades later. Again, I believe way more could safely be done from the browser side, but this would be a sane start.

Edit: I realize I'm giving the oldest possible example, which necessarily triggers quirks mode, but the same can be said of any modern, proper HTML document without CSS.

[1] http://info.cern.ch/hypertext/WWW/TheProject.html

[2] https://imgur.com/gallery/0THBhQ9

[3] https://imgur.com/gallery/HQkxXFF

sergeykish · on June 23, 2020

Thank you. Ah, so you grudge is <meta name="viewport" content="width=device-width, initial-scale=1">, sorry, I was confused by default awful table style.

That is much harder, there was no simple answer [1], browsers displayed viewport, some (Opera Mobile or Mini) had text reflow option. One company decided that it is authors responsibility [2], others followed. And now it is bad default.

So what can we do? Improve own experience – vote with your feet, inject setting in extension, patch and compile open source browser – plenty of options, and write about it. From my experience – Linux, uMatrix, Stylus, patch Firefox, have to write now (I'll edit later). That's why why meant by "have you updated yours (user.css)?".

If anything we need more choice. Build your own browser -Ungoogled, Iridium, Tor browser, Beaker Browser. Hack on smaller codebase - Netsurf, Dillo. There are graphical DOS browsers, console browsers, BeOS browser, Plan 9 browser, Emacs browser.

I mentioned quirks mode (correct, without doctype) to show how insignificant details with almost no presence still guarded by browsers. That page was written before pixel perfect reproduction was a thing. There are a lot of pages from another era https://theoldnet.com/

[1] https://www.quirksmode.org/mobile/viewports2.html

[2] https://developer.apple.com/library/archive/documentation/Ap...

ezequiel-garzon · on June 23, 2020

Thank you. Ah, so you grudge is <meta name="viewport" content="width=device-width, initial-scale=1">, sorry, I was confused by default awful table style.

Thank you, Sergey! And now that I've taken a look at your home webpage I'm happy to see you agree with me!

Thanks for the detailed explanations and the links. Comments like yours make HN the Internet gem it is!

jasonlotito · on June 22, 2020

Developers can already do this. They can just use HTML. Plain old HTML. They don't have to use JavaScript. They don't have to use CSS. They can just use plain boring HTML. That they can do this now makes me ask the question: what problem are we solving? If people aren't using the existing solution enough, and the existing solution solves the "issues" of CSS and JavaScript being used, then what is the problem?

chacha2 · on June 22, 2020

The global spynet, the waste of computing devices that need to be constantly upgraded to handle the new minimum requirements of the web and the degrading accessibility of many websites.

enumjorge · on June 22, 2020

Yes those are problems, but we already have a technical solution to address that, plain HTML. But content providers aren’t using that. They aren’t incentivized to do so. Forking web technologies attempts to solve the already-solved technical problem without addressing the real problem which is how do you incentivize people to create content without all the bloat?

sergeykish · on June 22, 2020

Technology is here, each modern browser can be restricted to HTML only per page by user. If it is not enough use NetSurf, Dillo, a bunch of console browsers.

hinkley · on June 22, 2020

What people can do and what the will do have very little in common.

Without the cooperation of my coworkers, bosses, and third party suppliers, very little of what I do will stick, and I have to be clever or at least subtle for that to happen. Any broad-stroke things like 'Just use HTML'? You've got to be kidding.

boomlinde · on June 23, 2020

solderpunk addressed this (in terms of Gemini): https://portal.mozz.us/gemini/gemini.circumlunar.space/%7Eso...

zokier · on June 22, 2020

I think the crucial thing is that it could be a "soft fork", something that is fully interoperable with current browsers. The important thing is to foster a community of people who are willing accept a common ground. Think of Medium or tumblr etc, but with different angle. Sure, it won't be for everyone, but it doesn't need to be massively popular to be a success and to thrive. You could build interesting link aggregations and search engines around sites built with the "dogma" spec, which allows people more readily spend time on the sort of sites the like. Nobody thinks that you can't build low-fi sites already now, but connecting authors and readers is the tricky part.

Of course I have my own ideas how it could work, there could be an optional headers, one from browsers saying "I want light/fat version" and one from server indicating if it's lightweight so that browser can apply better default styling and disable features. I also think it makes sense to define different levels or profiles of features so that you don't need to try to get everyone to agree on everything, but that might be problematic.

tannhaeuser · on June 22, 2020

Forking HTML might not make sense indeed, but I think this post doesn't make a compelling point as to why or why not.

MaxBarraclough · on June 22, 2020

The article offers a few reasons but they strike me as weak sauce.

> persuading Web developers to use it is where the real problem lies

So what? There are various alternative markup formats, like Gemini, Gopher, Finger, Troff (used for Unix man pages), and perhaps we'd also count Markdown, and even RTF. Not every project aims for world domination. Websites interested in being lightweight, already can. HackerNews, Pinboard, and SourceHut, all manage it.

> getting your entire constituency to agree on a specific list of allowed features is going to be very difficult

Maybe so, but it doesn't sound insurmountable. This sounds no more challenging than any other committee-driven process.

> Video? Animated GIFs? CSS :hover? CSS media queries? Different people will give different answers to these questions.

My answer is no on all counts, for what it's worth. None of those things belong in a simple minimal document format.

> we could build a much more efficient browser (or browser mode) for that subset. As a former Mozilla distinguished engineer, I'm skeptical.

It's obvious that the system requirements for such a browser would be far lower than those of Firefox. We already have a project much like this in Castor. [0] I figure that's the point here. No-one is suggesting that Firefox, running on a modern desktop machine, struggles to render basic HTML. It's still heavyweight software though. The idea of running Firefox on an Arduino, for example, is laughable.

[0] https://sr.ht/~julienxx/Castor/

edit Forgot to mention, this question of subsetting HTML, vs using an alternative like Gemini in the Castor browser, cropped up in recent discussion at https://news.ycombinator.com/item?id=23165029

edit 2 See also Point 2.5, Why not just use a subset of HTTP and HTML?, at https://gemini.circumlunar.space/docs/faq.html

enumjorge · on June 22, 2020

I don’t understand your first point. As you state yourself, there’s already alternative formats for minimal websites. Current HTML can already do simple, content-only documents. But developers aren’t using those approaches in significant numbers. The existence of these options is not enough to solve the problem otherwise it’d be considered solved. You still need to convince devs to create content for it, which goes back to the authors original point.

MaxBarraclough · on June 22, 2020

I'll rephrase what I think you're saying, as I think it's a valid point:

Given that we already have the ability to make lightweight HTML-based web-pages, and we also already have a variety of lightweight alternative markup languages to choose from, what's the point of picking one particular subset of HTML and giving it a brand-name? The real goal should be to encourage web developers to make better, more lightweight websites, and/or to encourage use of lightweight alternative markup technologies (to better support very simple browser solutions).

Again I think it's a fair point.

Another related point: the Gemini project seems to be capable of automatic conversion to HTML. Any such project should make this ability a priority. If you then go ahead and use that, you get an HTML subset 'for free', as the generated HTML files will be lightweight and use few HTML features.

We do this today with Markdown. HackerNews won't let me generate a <blink> tag, but we're allowed <i> tags.

jillesvangurp · on June 22, 2020

If you step back a little what we have today is a combination of a common browser implementations (aka. the DOM tree + render logic) and bunch of standardized parsing and interpretation logic that maps textual versions of CSS and HTML to this.

For better or for worse, this stuff is standardized via WhatWG & W3C as well as the three remaining browser engines (chromium, gecko, and safari) whose intersection of behavior is the de-facto reference and also what is driving standardization forward. Forking those definitely does not make sense unless you are the size of Apple or Google and more than one big company has backed out of pushing their own implementation (e.g. MS).

Most so-called full stack apps bypass the business of parsing of html5 & css in favor of just directly driving the DOM api. This provides greater flexibility and generally sidesteps a lot of complexity, bugs, etc. related to subtle differences between the 3 implementations. A typical index.html contains little more than the bare essentials to load the javascript; which then drives the DOM directly. This annoys some purists/traditionalists on the web definitely not in a position to fork or support their fork of any browser but makes little difference to search engines, users, or browser implementations.

So now having established that this is mostly an academic discussion of forking something that is highly unlikely to be ever actually forked (successfully), we can look at the underlying question.

This would be whether the DOM still an appropriate way to represent highly complex and dynamic applications that are not primarily documents and increasingly cover the whole range of any kind of interactive application possible (terminal UIs, windows based UIs, games, VR/AR, voice driven chat bots, etc.). For better or worse, most web development is about providing an illusion of more interactivity than a typical document would provide. The business of rendering static text to a browser is kind of a solved problem. It's everything else that's kind of hard to deal with via a DOM api. WASM is breaking this discussion wide open as suddenly people are porting decades worth of native code to run in a browser. Just because you can doesn't mean you should of course. But there's a lot of UI that can run in a browser that doesn't need or completely bypasses the DOM these days. Accessibility is a good remaining reason to still use it. But beyond that?

tannhaeuser · on June 22, 2020

> The business of rendering static text to a browser is kind of a solved problem. It's everything else that's kind of hard to deal with via a DOM api. WASM is breaking this discussion wide open as suddenly people are porting decades worth of native code to run in a browser.

If it is so hard to layer a coherent and not laughably overcomplicated runtime on top of a document viewer in 20 years, then it follows that it probably wasn't such a good idea to begin with. Unless you're hellbent to freeride on the web's success, and ruining it in the process. At which point we're trying to solve an economic problem (someone else's) via a technical solution.

To get a sense how far of the mark browsers still are, you only have to look into the problem of rich text editing on browsers, which is a natural step up from pure text browsing, and something the first browser did already. Today, this is only barely possible thanks to a couple rich text editor projects who made heroic efforts to work out browser quirks (with contenteditable such that the browser's spellchecking can be leveraged, or alternatively using Canvas/WebGl, essentially developing a browser-in-browser).

In a situation where browser vendors struggle to keep up, adding additional tech such as WASM is the last thing you want to do. When and if WASM gets even basic language infrastructure such as gc, DOM or WebGl bindings, or anything even mildly interesting, then it'll immediately become another maintenance problem. And as you said yourself, at best WASM can help port apps over to a browser. It's however not clear what the purpose of that exercise should be when said apps run just fine outside the browser. I'd say if there were indeed so many apps waiting to be ported over to running in the browser, then there have been sufficiently sophisticated transpilers for like 15 years now (emscripten, gwt, many others), so I'm not buying that as an argument to make browsers even more complex.

sergeykish · on June 22, 2020

You are not in charge. If contenteditable is such a nuisance you can fix it, fill bugs and help with triage. I've been recently playing with it. There are so many bugs like noone use it. A lot of people want WASM and drive it. It is not zero sum game.

There already several tiers of the browsers. Top tier opens competition with native/mobile. This may be huge - Windows, Mac, *nix, iOS, Android - universal application without walled garden.

Next tier works with mostly static content - NetSurf and Dillo. Dillo is fast, no need to throw away web. Any open source browser has entire history - pull and compile. First Mozilla check in is before Gecko - Layout Classic [1].

It is not browsers who ruins experience but authors. I browse without JS, it works.

[1] https://github.com/sergeykish/gecko-dev/tree/init/lib/layout

kodablah · on June 22, 2020

I had begun to spec out a strict subset of HTML[0] until I realized I had just recreated AMP. What we need is an AMP-like HTML subset w/out the custom tags and ads (same issue with htmx). Any such HTML "fork" should make it a goal to be easily implement a viewer/browser of said type which means instead of just adding things, things need to be removed too.

0 - https://github.com/cretz/narrow

bjoli · on June 22, 2020

I think there is a rather big divide between people like me who thinks the web should be "content-only" and people who see it as an application platform.

Most of what I do on the web resolves around reading things, not interacting in any meaningful way. Most of the web tries it's best to ruin that experience. Most of what I do online would be better served by gopher. I understand I am a part of a minuscule minority, but that doesn't stop me from whining.

sergeykish · on June 22, 2020

But what is the difference once you disable JS and CSS? And reader mode? I find it great that I can progressively enable features.

bjoli · on June 22, 2020

I can't browser it in emacs, the world's best tool for navigating and working with text :)

I have thought about what I want since writing. I want a format where the receiving end decides how to display it.

Tabular data? Give me a default sort and leave the rest to me. Images? Should they break text or be showed as a thumbnail with textflowibg around it?

I haven't thought a lot about it, but I want content with metadata attached.

sergeykish · on June 22, 2020

Sure, and browser is a javascript virtual machine to execute HTML/CSS/JS serialized images fetched from network. It just happens that some images can be converted to readable text.

I still don't get it, most of the web perfectly works without JS, so what is the problem? Most of the authors equally don't care about your and my case. Maybe text around article? It is easier with CSS support, thought it should be possible to replicate reader mode heuristics if it was not done already, fetch what to hide from UserScripts or create alternative.

Images may come in different resolutions (up to 8K).

fortran77 · on June 22, 2020

Remember XML/XSLT?

You were supposed to write your webpage in pure XML -- this way the content was pure, and can be indexed, processed by other programs, etc. -- and then styled by XSL transformations.

I spent a lot of time learning this. It never caught on. Browsers still support it.

http://xml.silmaril.ie/browsers.html

majewsky · on June 22, 2020

Our hackerspace's website is written in XML/XSLT that renders HTML during `make`. I haven't been around when it was built. All I know is that there are only 2 people who know the guts of the XSLT stuff. Everyone else just copy-pastes XML snippets from somewhere else and edits them to publish a new news item or calendar entry.

(I don't know how much of that is attributable to XSLT's nature, and how much of it is due to its relative obscurity. But it's certainly not a stack that I would choose for a new website based on that experience.)

bjoli · on June 22, 2020

I just style XML directly with CSS. That works fine if you just want to display it in a browser. XSLT can of course do a lot more complex transformations on your data.

sergeykish · on June 22, 2020

Could you please share a bit more? I know I can style XML with <style>, it can be handy to make RSS readable - http://feeds.bbci.co.uk/news/rss.xml

Anything else where it shines?

bjoli · on June 22, 2020

You can make any document into a full blown web page. I remember looking at the source of a snappy yet functional web page recently. It was the raw XML output from their code comments with an XML front page. It was all just raw XML and CSS.

There is no "time until valid paint". Everything renders when it's done, which means you get a lag before things show up in screen where you otherwise already have things rendering.

danShumway · on June 22, 2020

One idea that I bring up whenever I see arguments around what features should be added to HTML or what splits should happen is that the DOM is not primarily a layout language, it is a presentation layer.

The really clever, wonderful part of HTML/DOM that is worth keeping around is that it forces you to describe your current page/application state as pure text.

Imagine if every time you wanted to write a GUI app with GTK, you were first required to build a functioning terminal interface, and then (for the most part) only allowed to pull elements from that terminal interface into the main GUI of your app. That is how the web do.

So proposals around, "HTML needs to be its own static thing" kind of miss the point in my mind. Suppose we had an amp-like purely declarative format that replaced HTML and it had some basic controls around data-binding, and infinite scroll, and whatever. At that point, it would cease to be a semantic presentation layer, it would be a document layout tool.

Of course, no system is perfect, there are some exceptions in HTML that blur that line, largely because the web is very messy and its hard to correct mistakes once they're out in the wild. And yes, Google is pushing templates and shadow-DOM, both of which are bad ideas that make the web worse. We can't win everything. But I will still strongly assert that HTML is not here to make you happy as a developer -- HTML is for the user. The point of HTML is that it forces you to describe your interface as semantic data, not pixels.

----

To extend off of that point, the other thing I'll go to my grave asserting is that once you start to think of HTML as a user-accessible presentation layer for what your state is right now, you start to realize that the distinction between web pages and applications is kind of garbage.

Most apps are just interactive documents, regardless of whether they're online or on native platforms. There are some exceptions (3D editing software, maps, etc...), but for the most part what I tell UX designers is that if they can't sit down and describe the current state of an application as an XML tree, they probably don't have a very good grasp of what that state is or how to organize it for the user.

I think I can count on one hand the number of apps I have installed on my desktop computer that couldn't be expressed in HTML. Even apps like Photoshop turn out to be XML documents with a few embedded canvases once you really think about them.

protomyth · on June 22, 2020

Heck, go back to using WML. It would be interesting to build that browser again, the cards would work.

https://en.wikipedia.org/wiki/Wireless_Markup_Language

miohtama · on June 22, 2020

I worked with some of very first mobile web sites. It was WML and Nokia 7110. While it seems rosy now, it was not that pleasant to work with. Pushing out any meaningful user interaction was painful. It was similar to remote control menus on TVs.

Plus it did not help that the entire device rebooted on a markup parsing errors and such bugs.

protomyth · on June 22, 2020

How much of the problem was the lack of power in the platform versus the actual design of WML. I remember it worked ok, but I was using a better platform at the time. Then again, I really didn't go much farther than working through the ORA book.

I suppose, strictly, it did have WMLscript to go with it so might not be suitable for the original author.

miohtama · on June 22, 2020

As far as I remember WML was processed to a binary payload by an operator gateway and the phone never tried to interpret XML itself Thus, a special operator support was required with GPRS connection.

In 2000 they also had WAP-over-SMS. That was slow. And expensive. One SMS could carry 140 bytes.

protomyth · on June 22, 2020

Yeah, I remember the binary convert thing, but the WML itself wasn't so bad.

devit · on June 22, 2020

Seems easy to define such a fork: disable JavaScript, and load all resources immediately and unconditionally to disable tracking based on what is visible.

To prevent custom UI, also disable CSS and use a decent default stylesheet (e.g. the reader mode one or a Markdown one).

badsectoracula · on June 22, 2020

Yes, the post says exactly that - forking the HTML is easy, the hard part is convincing developers to use that fork instead of the more featureful "mainstream" version.

Consider that these are the same devs complaining about the feature differences between Chrome and Firefox, let alone the vast majority of developers would consider targeting something like Internet Explorer 11 (which already contains A TON of functionality, WAY more than you'd need for a "trimmed down fork" and is still a VERY complex project if you wanted to make a full clone from scratch, despite it being a few years out of date) as something they'd do only in their worst nightmares.

This is why such propositions are hard. Well, that and all the existing web sites that use the existing tech, most of them not being made by huge corporations with big pockets and often are left alone to work with barely any modifications (web development is often made fun of for being too brittle, but this brittleness exists at the framework level, not the web browser level that has very strong backward compatibility).

tobr · on June 22, 2020

You don’t need to convince existing developers to adopt it. Remember the early web spirit, where people imagined that everyone would run their own server and publish their own homepage? Turns out it was way too complicated to do, and required you to be a tech enthusiast. Largely, I would say, because HTML was not designed to accommodate for a lot of the things people immediately wanted to do. Even a basic thing like keeping navigation consistent across the multiple pages is impossible.

I think if you designed a document language that took advantage of 30 years of experience of what people actually want to do on the web, you could find an entirely new set of users. People who want to feel ownership of what they publish, want to be able to be creative and improvise, but don’t have the technical skills to create a site in the current web stack. It would probably look more like a social media service, so it could run on top of the web as it is now, or have dedicated native clients.

badsectoracula · on June 22, 2020

Honestly, this 'entirely new set of users' sounds like wishful thinking for a problem that does not exist as you describe it.

Back in the early web people couldn't run their own server not because HTML was hard (WYSIWYG editors not only existed since the Windows 3.1 days, but they were very widespread at the time - Netscape Gold even came with one included and that could do pretty much everything you'd see in most pages) but because it was hard to have and maintain the necessary hardware and internet connection.

This still exists and is still an issue today if you want the full ownership down to running your own server. But if you do not care about running your own server and you are fine with shared hosting (which existed even in the 90s, see geocities) or a VPS, then outside of a basic setup you do not need to be much of a technical user (and many hosting and VPS providers have tools to do that setup for you, often for free). For the slightly more technical users, there are tools like Publii (stupid name, but the tool works) that can do mostly full WYSIWYG site editing, management, syncing, etc.

And a dedicated native client? From a user's perspective there is nothing to win here, they already have a browser (and the less technical users are confused by even that), why would they run another browser that wont even work with the majority of the content they want to access?

Really, these are not practical solutions for practical problems. That doesn't mean you shouldn't try to make something like this, but they'll just be toys for fun, not real solutions to real problems and if you expect them to be anything like that you'd be disappointed.

After all Gopher, for example, exists and can be targeted and used, but all of its users are using it for fun and because they can, not because they expect it to compete with the web (well, outside of edgy "the web sux, gopher is the future" comments that are at the same level as "M$ suxx0rz, linux rulez" you'd see not so long ago).

tobr · on June 22, 2020

I think we are imagining very different things! For what I tried to describe, I think a dedicated native mobile client would be the most natural way to use it. I don’t think it’s a strange idea at all - consider how popular dedicated apps like Instagram, TikTok, YouTube, Facebook etc are. It’s not confusing or a burden in any way.

Publii or anything you could one-click-install on a web host is nowhere near fully featured for what people actually want to do on the web. Again, look at the type of activity that happen on most social networks - liking, sharing, commenting, replying, bookmarking, retweeting, remixing, curating playlists and galleries, etc. Those communal space-building activities have become as foundational concepts for the internet as linking, but are exceptionally difficult to implement on the current web.

chriswarbo · on June 22, 2020

> Really, these are not practical solutions for practical problems. That doesn't mean you shouldn't try to make something like this, but they'll just be toys for fun, not real solutions to real problems and if you expect them to be anything like that you'd be disappointed.

I agree. I like the idea, but there's no way it would get traction without solving a concrete (rather than aesthetic) problem.

It is possible though: the more expressive/powerful a system is, the less we can know/guarantee/figure-out about it. Adding features does have downsides. For example, people are starting to pay attention to pure functional programming, since the restrictions it requires (e.g. immutability, referential transparency, confluent evaluation, etc.) provide lots of nice solutions for things like concurrent, distributed systems.

The classic example on the Web is search: if sites didn't work with dumb, lightweight user agents (search bots), it could have a real commercial impact. These days the big search engines use browser-derived bots, which has changed the dynamics a little; but the core point remains the same. If people find concrete problems or opportunities faced by Web sites, which would be useful to solve automatically, but where the complexities of the current Web prevent that, then perhaps people could be convinced to use a restricted subset of the Web.

tannhaeuser · on June 22, 2020

A browser doesn't have to win a beauty contest in front of developers; that would be futile anyway, given how inherently generational web development is. It only needs to win over users (let's call them readers, because that's what a browser is for in the end).

jasonhansel · on June 22, 2020

Really I just want browsers to have built-in support for running single-page apps. So you could go to a site and have it just send your browser a JS file, avoiding even needing to fetch and HTML file with a script tag.

ultrarunner · on June 22, 2020

Sounds a little too much like side loading to me, which means everyone with an app store is probably not terribly motivated to implement it.

d--b · on June 22, 2020

> Quite often someone proposes something like this

Who says this?

I mean is it worth to write a response to some vague idea that no serious person has actually put forward?

coldtea · on June 22, 2020

>Who says this?

Several people. If you follow HN and geek blogs, it comes up often...

babbledabbler · on June 23, 2020

I'm working on a framework that treats HTML as a static language https://findingmyhtmlgoddess.com

I agree, we shouldn't need to create a new standard to do what's already possible with existing HTML and some basic constraints and best practices.

ironmagma · on June 23, 2020

It's funny that people pick the most successful app development platform, well, ever, to knock down as broken and fundamentally flawed. Compared to what? All the other platforms are not as popular, probably for a reason.

udbhavs · on June 22, 2020

I wonder if it would be practical to make a VPN-like service that acts like a middleman and handles all the JS code, forwarding only HTML, stylesheets and other static assets to the client using websockets. So something like a hover or click event would be sent to the server, which would run the javascript code and it would send back the resulting changes to the DOM which the browser can render.

satyrnein · on June 22, 2020

I think AMP actually meets the criteria, but I doubt it's what people have in mind.

madmaniak · on June 22, 2020

I like the original statement!

foobar_ · on June 22, 2020

There's already a fork. Its called epub.

gridlockd · on June 22, 2020

It does make sense from the point of view of someone embedding HTML as a markup language, for users to edit documents online on a shared domain, and for tooling to be designed alongside these constraints.

As it is, every piece of tooling has to define its own arbitrarily limited subset or an unlimited set of features (and associated attack surface).