> Why didn't we end up with SGML -> XML "compilers"?
We did; both the osx command-line tool of the venerable SP/OpenSP package, as well as sgmlproc (sgmljs) with output_format=xml does exactly that: output canonical XML markup with shortrefs resolved, omitted tags inferred, attribute values put in quotes and attribute names preprended where not already present, conditional marked sections included or omitted depending on parameter entities, and also entity references expanded, etc. But SGML can also output HTML proper, unlike XML.
SGML mostly has additional authoring features over XML indeed, but a number of additional concepts as well: much more powerful notations (used as general extension mechanism such as for math or parametric macro expansion) and stylesheets ie. link process declarations with state-dependent assignment of attributes and pipelining to yield markup projections, transforms, and views.
I was a big supporter of SGML-based languages: markup language written for humans to author.
However, the trend in computing in late 90s and early 2000s was to come up with more easily parsed languages, thus came things like XML: a mark-up language tuned for computers to produce.
But let's be honest here: parsing most XML can be done very simply, whereas supporting basic SGML was only possible with the OpenSP.
SGML is a specification of over 1000 pages of dense text, and that's before you get a language DTD on top of it (like the HTML or DocBook or TEI DTDs). Basically, it is too complex and too flexible, and it was too expensive to produce the tooling to support it (GUI editors, processing tools, making them performant...).
I mean, we are looking at MD here that is even less flexible than HTML: simplicity wins even if it only caters to 90% of the usecases!
HTML4 was the last of "HTML-is-an-SGML-application" (that was the terminology when you define a document type with a SGML DTD) attempt before XHTML 1.0 came out.
SGML is what allows implicit closing tags, for instance.
Of course, even XHTML failed because it was too strict and browsers couldn't trust websites with following it to the letter, so we ended up with HTML as of today: clearly coming out of both, but not really either of them anymore.
Me, too, having worked at IBM and used SGML there. But JSON is what really killed XML. It can be harder to read, especially at first, but it's shorter and fulfills all the same roles.
I wouldn't say JSON killed XML: it's still widely in use for documents whose type definition changes rarely and which are more content oriented. The one benefit to XML/SGML languages is that you've got simple, ubiquotious support for "attributes", plain text content and nested tree content.
I.e. to represent
<p>An <acronym expanded="HyperText Markup Language">HTML</> page was the driver for interactive web.
in JSON, you have to come up with your own conventions for attributes and content:
So with JSON, everyone comes up with their own format. And in these cases that XML was designed for (to mark up textual content), it handily beats JSON in expressiveness, simplicity and terseness too. The fact that it was misused for defining protocols and objects (i.e. SOAP, ugh) is a different matter.
I would say that SGML/XML languages still have this benefit over even Markdown: any contextual modifier is either impossible or uses a one-off syntax (like images or links with text).
It's said that the father of LISP, John McCarthy, lamented the W3C's choice of SGML as the basis for HTML : « An environment where the markup, styling and scripting is all s-expression based would be nice. »
We did; both the osx command-line tool of the venerable SP/OpenSP package, as well as sgmlproc (sgmljs) with output_format=xml does exactly that: output canonical XML markup with shortrefs resolved, omitted tags inferred, attribute values put in quotes and attribute names preprended where not already present, conditional marked sections included or omitted depending on parameter entities, and also entity references expanded, etc. But SGML can also output HTML proper, unlike XML.
SGML mostly has additional authoring features over XML indeed, but a number of additional concepts as well: much more powerful notations (used as general extension mechanism such as for math or parametric macro expansion) and stylesheets ie. link process declarations with state-dependent assignment of attributes and pipelining to yield markup projections, transforms, and views.