Re: [dev] XML vs HTML (was: Article about suckless on

From: FRIGN <>
Date: Fri, 21 Feb 2014 12:03:00 +0100

On Fri, 21 Feb 2014 13:34:41 +0100
Eckehard Berns <> wrote:

> There has been a lot of discussion why strict XML parsers don't belong
> in a browser. There even are XHTML enthusiasts that are against it.

Yes, I've been listening to both sides for a few years now.

> You only write a parser once. But you write some magnitude more markup
> that is going to be parsed by it. So optimizing the markup specification
> for authoring has a better net gain than to optimize the protocol just to
> get away with a simpler parser.

This would be an appropriate point if the SGML-parsers weren't lossy in
this regard.
I've read lots of HTML-markup and often ran into problems when people
didn't take care of well-formedness.
Often, they run into quirks and their Browser's SGML-parser fixes them.
However, there's no guarantee another Browser would do the same and
damn, don't ever try to modify the markup later!
This is not an edge-case. I run into these problems day by day.

> That's why HTML uses only a subset of SGML.

The point is that they allow ambiguity.

> That said, I don't want to defend HTML and the web as such, but it would
> be much worse with XML IMO. At least from my perspective.

I really don't see your point why exactly XML should be bad for the
If you write proper, well-formed markup, nothing really changes for
you, except that the browser _knows_ it's dealing with proper markup
and doesn't have to "fire up" it's forgiving but sloppy SGML-parser.

It may not be clear here that switching from SGML to XML parsing only
incorporates changing the MIME-type from text/html to application/xhtml
If your markup is messed up, it throws an error and stops parsing
(which is really helpful), instead of silently attempting to fix errors
like the SGML parser, which is a real chore to implement.

XML parsing is not a simple thing either, but at least you don't have
to deal with bloody guesswork!



Received on Fri Feb 21 2014 - 12:03:00 CET

This archive was generated by hypermail 2.3.0 : Fri Feb 21 2014 - 14:00:10 CET