Markdown is a convenient but deeply limited markup language with only a small subset of html's features. And yes, limitations are good because we want documents not web apps, etc, etc, but I mean "images can't have captions" limited, "navigation bars don't exist" limited. Actual important features of html don't exist in markdown, which is why almost every markdown platform ends up adding extensions and shortcodes. Why use markdown at all? Just use html.
"But html isn't style-agnostic" yes it is. CSS isn't style-agnostic. Instead of a markdown browser, how about a browser with a fixed stylesheet and no js? You don't even need a browser for that, that could just be a userscript that gets plugged into an existing browser. It'd break non-compliant websites that require javascript or custom css, but so would a markdown browser. Most people wouldn't write content for it, but most people wouldn't write content for a markdown browser either.
"But html is cluttered" it doesn't have to be. This is a valid webpage:
<!doctype html>
<title>Page Title</title>
<h1>Page Title</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Ut ac lorem ut massa euismod vestibulum.
<p>Nullam rutrum blandit eleifend. Aenean a varius diam.
Morbi sodales velit nunc, vel vestibulum lorem tempus sodales.
Personally, I prefer writing in markdown, but that's no reason to insert a markdown renderer into browsers. HTML can already be as sleek and readable as you want. If we added a new type of markup for anybody with a personal preference, we'd never stop.
The fatal flaw of HTML (and XML for that matter) is that the tags have the same visual weight as the text they're delimiting, which makes for a sense of clutter even in your minimal example.
Markdown really scores here, by having a pleasing plain text representation as a goal from the outset, and I've love to see it used more widely for web pages.
I'd also love to see it more widely used for offline reading, too - the help files in an application really shouldn't need to invoke a web browser to view them when a lightweight markdown viewer would do the job. Not that there is a lightweight markdown viewer, mind you!
HTML is based on SGML, and SGML has short references to handle lightweight custom syntaxes. For example, you can define that an asterisk appearing in your content within a <p> element is replaced by <em>, and moreover define that an asterisk appearing within <em> content is replaced by </em>, toggling emphasized text tags. So SGML very much acknowledges the need for lightweight markup, but the SHORTREF feature, like everything else requiring markup declarations, didn't make it into the XML subset of SGML.
HTML itself doesn't have these and other features (such as basic text macros) because SGML was understood to be available at least at authoring time.
Why didn't we end up with SGML -> XML "compilers"?
As I understand it, XML is intended to be equivalent to SGML, with less syntactic flexibility to make it easier to parse. So once you've done the hard work of parsing SGML, it seems like it should be straightforward to emit the same data as XML for further machine processing.
Or are there some SGML features that cannot be represented with an equivalent in XML?
> Why didn't we end up with SGML -> XML "compilers"?
We did; both the osx command-line tool of the venerable SP/OpenSP package, as well as sgmlproc (sgmljs) with output_format=xml does exactly that: output canonical XML markup with shortrefs resolved, omitted tags inferred, attribute values put in quotes and attribute names preprended where not already present, conditional marked sections included or omitted depending on parameter entities, and also entity references expanded, etc. But SGML can also output HTML proper, unlike XML.
SGML mostly has additional authoring features over XML indeed, but a number of additional concepts as well: much more powerful notations (used as general extension mechanism such as for math or parametric macro expansion) and stylesheets ie. link process declarations with state-dependent assignment of attributes and pipelining to yield markup projections, transforms, and views.
I was a big supporter of SGML-based languages: markup language written for humans to author.
However, the trend in computing in late 90s and early 2000s was to come up with more easily parsed languages, thus came things like XML: a mark-up language tuned for computers to produce.
But let's be honest here: parsing most XML can be done very simply, whereas supporting basic SGML was only possible with the OpenSP.
SGML is a specification of over 1000 pages of dense text, and that's before you get a language DTD on top of it (like the HTML or DocBook or TEI DTDs). Basically, it is too complex and too flexible, and it was too expensive to produce the tooling to support it (GUI editors, processing tools, making them performant...).
I mean, we are looking at MD here that is even less flexible than HTML: simplicity wins even if it only caters to 90% of the usecases!
HTML4 was the last of "HTML-is-an-SGML-application" (that was the terminology when you define a document type with a SGML DTD) attempt before XHTML 1.0 came out.
SGML is what allows implicit closing tags, for instance.
Of course, even XHTML failed because it was too strict and browsers couldn't trust websites with following it to the letter, so we ended up with HTML as of today: clearly coming out of both, but not really either of them anymore.
Me, too, having worked at IBM and used SGML there. But JSON is what really killed XML. It can be harder to read, especially at first, but it's shorter and fulfills all the same roles.
I wouldn't say JSON killed XML: it's still widely in use for documents whose type definition changes rarely and which are more content oriented. The one benefit to XML/SGML languages is that you've got simple, ubiquotious support for "attributes", plain text content and nested tree content.
I.e. to represent
<p>An <acronym expanded="HyperText Markup Language">HTML</> page was the driver for interactive web.
in JSON, you have to come up with your own conventions for attributes and content:
So with JSON, everyone comes up with their own format. And in these cases that XML was designed for (to mark up textual content), it handily beats JSON in expressiveness, simplicity and terseness too. The fact that it was misused for defining protocols and objects (i.e. SOAP, ugh) is a different matter.
I would say that SGML/XML languages still have this benefit over even Markdown: any contextual modifier is either impossible or uses a one-off syntax (like images or links with text).
It's said that the father of LISP, John McCarthy, lamented the W3C's choice of SGML as the basis for HTML : « An environment where the markup, styling and scripting is all s-expression based would be nice. »
Markdown is undoubtedly more readable, but HTML can be more readable than most people make it. And considering that the ultimate goal is to wind up with a layed-out, styled document, its capabilities in that regard are just plain-old more important, especially since markdown isn't going to replace WYSIWYG editors any time soon, and almost everybody who needs to know HTML can learn it relatively easily. Browsers collapse white space by default, so you've got a lot of flexibility with its formatting:
<!doctype html>
<title>
Page Title
</title>
<h1>
Page Title
</h1>
<p>
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Ut ac lorem ut massa
euismod vestibulum.
<p>
Nullam rutrum blandit eleifend. Aenean a
varius diam. Morbi sodales velit nunc, vel
vestibulum lorem tempus sodales.
--or--
<!doctype html>
<title> Page Title </title>
<h1> Page Title </h1>
<p> Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Ut ac lorem ut massa
euismod vestibulum.
<p> Nullam rutrum blandit eleifend. Aenean a
varius diam. Morbi sodales velit nunc, vel
vestibulum lorem tempus sodales.
I get why many developers like this idea... Web developers are responsible for implementing the complex user-facing parts, and their primary weapon is text: doing extra work sucks, and when you're a hammer, everything looks like a nail. But developers are not designers, and design not being left to developers in mature organizations is no accident. Absolute, deliberate, limiting simplicity is always an attractive argument if you dismiss the value of, or maybe don't even understand the reason for the complexity. I won't deny the advantages of reader-view-level simplicity in web design: it's easier to visually parse, more performant, and easier to navigate compared to most web pages, similar to how books compare to magazines-- but about 225 million people per year in the US read magazines and I assure you most of them would not choose to have textural printouts in lieu of their current form. While people like having the option of a uniform, grey, easily visually parseable mode to view webpages, that's probably not what they want even most of the time, let alone as a deliberate limitation.
One problem with this style is that if you copy any text from this website you will have trailing spaces after each paragraph. To avoid that you have to close the tags (or open the next one) directly after the text.
Sure, if a trailing space in copied/paste functionality or precise :after placement is important then you'd need to modify the ending tag placement... but prioritizing that use case seems like a premature optimization. I don't think that makes a drastic difference. Compared to markdown, you've still got a heck of a lot more formatting flexibility without changing the rendered product.
I don't think that's as rare as people say, especially in smaller organizations.
Having an art school design education and a bit over a decade in (mostly back-end) web development, I've had plenty of deseloper type roles. If they fall under a design or marketing department, they'll spend 80% of the time doing design work and try to throw it together on some shitty wysiwyg monstrosity, ignoring performance, stability, maintainability, etc. If they fall under technical departments, design, ahem, decoration and polish is something to be applied at the end, if there's time, after the real work is done. Either way, having the same group of people responsible for two halves of that coin rarely yields a good balance, and they almost never pay any real attention to usability ... at least not for use cases that don't exactly mirror their own. Seems to me that replacing the flexibility of current markup and styling tools with simple markdown and reader-type layouts is just trying to apply the tech-focused solution to the entire problem the way Flash tried to do the opposite.
Yes, that's a perfectly viable workaround, but it's still a band-aid that requires expending resources that wouldn't need to be spent if the markup method had been better chosen for readability. (To be specific, I believe the angle-brackets are the main culprit.)
Technically XML has some machinery to support more lightweight notations. It won't parse these notations, of course, but the information is accessible to the users of XML reader. The mechanism should work like that:
<?xml version="1.0"?>
<!DOCTYPE myDoc [
<!NOTATION markdown PUBLIC "https://authority.org/markdown/v1.23">
<!NOTATION rest PUBLIC "urn:restructured-text/v4.56">
<!ELEMENT myDoc (note+)>
<!ELEMENT note CDATA>
<!ATTLIST note
notation NOTATION (markdown|rest) #REQUIRED>
]>
<myDoc>
<note notation="rest">
restructured text goes here
</note>
<note notation="markdown">
markdown goes here
</note>
</myDoc>
Not entirely sure what you mean by "two-armed key-chord". It's shift-, or shift-. -- my keyboard's bottom line goes <shift>\ZXCVBNM,./<shift>. < is right-index and right-pinky, and > is right-index and right-pinky (as shift is so much wider)
Now sure, some are home row afficiandos, and having # on the home row is certainly beneficial to those as your right-index can stay on J as god intended
Or do you have a different keyboard layout to me. Keyboard layouts - especially the location of things like ,./<>?@;'#:@~[]{} vary a lot depending on the country you are in.
It's very similar yet much fuller-featured than commonmark, with support for definition lists, footnotes, tables, several new kinds of inline formatting (insert, delete, highlight, superscript, subscript), math, smart punctuation, attributes that can be applied to any element, and generic containers for block-level, inline-level, and raw content. In addition, it resolves ambiguities in the commonmark spec and parses in linear time with no backtracking.
Exactly what I was thinking, by omitting the <html> <head> and <body> HTML can be quite concise [1]. Additionally the closing </li> can be omitted from lists and <li> barely a step over using - for bullet points.
The worst part about HTML is the links, though. Anchor tags are awful. Having to repeatedly type <a href="..."> and closing with </a> is wayyy too boilerplate much for for something that is simply surrounded with [square](brackets) in markdown.
I have the opposite problem. HTML <a href> links are consistent with the rest of the language. <a href>Something</a> makes the same kind of sense as <em>something</em>.
But markdown? I'm always forgetting the order of the (link)[text] or [link](text) or [text](link) or (text)[link]. It's just something that's invented, and not consistent with the rest of itself.
And, for the specific syntax: parentheses to surround the URL is jut bad because parentheses are URL code points, so you can’t just insert regular serialised URLs in Markdown in all cases. (See https://news.ycombinator.com/item?id=33340097 for more explanation.)
It's said that the father of LISP, John McCarthy, lamented the W3C's choice of SGML as the basis for HTML : « An environment where the markup, styling and scripting is all s-expression based would be nice. » The {lambda way} project could be an answer, small and simple: http://lambdaway.free.fr/lambdawalks/
In lambdatalk such a HTML code
<h1>Page Title</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Ut ac lorem ut massa euismod vestibulum.
<p>Nullam rutrum blandit eleifend. Aenean a varius diam.
Morbi sodales velit nunc, vel vestibulum lorem tempus sodales.
is written like this
_h1 Page Title
_p Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut ac lorem ut massa euismod vestibulum.
_p Nullam rutrum blandit eleifend. Aenean a varius diam. Morbi sodales velit nunc, vel vestibulum lorem tempus sodales.
And you can also compute 3x4 writing {x 3 4} or compute the factorial of 100, compute a Fast Fourier Transform, draw complex graphics, ... it's a true programming language with a coherent syntax, unlike Markdown.
<ul>
<li> Item one
<li> Item two
<li> Item three
<li> Item four
</ul>
It has advantages over markdown lists, too: You never need to mess with semantic indentation to add additional paragraphs to a given line, and you don't have to manually number ordered lists the way some markdown flavours ask you to.
I guess my point is browser all do easily provide (some with extensions) exactly this already - a mode where it's just HTML with some standard readable CSS already.
You can also do default user stylesheets.
This is a positive point for things being pretty well set up today.
On the contrary - Markdown is essentially a superset of HTML, so unless you're using a renderer that strips it from the input, you can have the best of both worlds.
This property was super useful for a lightweight CMS I threw together a few years and which is still used by the original customer today. 99% of what they need to render is easily authored in Markdown, and this further helps ensure a commonality of style and device portability.
The original markdown parser supported html because it was basically just a preprocessor that added some syntactic sugar to html. The proposal here isn't just "what if browsers had a markdown preprocessor" (although I also think that would be questionable), but "what if browsers limited content down to only markdown, so that the web was all just clean, style-agnostic documents," and that clearly requires that markdown not support arbitrary html.
Uh, yeah, that's a valid web-page, but I don't see how that counters "html is cluttered" statement. This is cluttered. It… just is. I know some people who suffered some mind deformation in academia and now claim LaTeX is the perfect markup for blogs, but I don't think I've encountered the same for html until now. I mean, does somebody really compose text in html?!
Markdown is deeply limited, that's true, but I often think that there is just a tiny bit of syntax lacking to make it just fine. Some actually is implemented in software like Pandoc or RedCarpet, there are a couple of ways to make tables (some better than others), LaTeX can be employed for formulas, some implementations have checklists, strikethrough, etc. It's just poorly standartized — and the spirit of original proposal (and misleading name) is at fault here as well, since later attempts to invent a standard mean very little when there is a dozen of different common implementations and not a single one is reasonably complete.
By the way, the fact that HTML was supposed to serve as an addition to Markdown doesn't help: you just cannot allow people to submit arbitrary HTML everywhere where something like Markdown is needed. To use it in comments on a forum you need to fully parse it anyway, explicitly enabling or disabling different features of some ubiquitous "full implementation".
Obviously, you cannot make an atrocity like a modern landing page in Markdown+. But… ok, I shouldn't be judgemental and claim such atrocities shouldn't exist — they can, but most blogs, forums (such as this one), etc. — really could have been just "viewer programs" of some standardized format, much more restrictive than HTML+CSS+JS, but a little less limited than Markdown.
All of this isn't very much related to the original topic, but seriously, I dream of some better version of Markdown someday becoming a de-facto standard markup language for all forums, messengers, blogging engines, whatever the general name for Jira is… You know what I mean.
I don't really have a solution, it just really feels like there shouldn't be that many additional features. A couple more of emphasis options, a couple less ways to do the same thing (I mean, it's stupid to convert all of */-/+ to the same <li> elements), colors, better image embeddings (with captions), sidenotes, better ways to handle formulas (there are enough dedicated literals in Unicode to construct most simple formulas without the need for LaTeX, but they still need to be parsed to be rendered pretty) and simple UML-like stuff… I'm pretty sure the comprehensive list of features for 6-σ usecases cannot be THAT huge. Big, yes. Not endless. And most features surely have some "plain-text" (or very light special syntax) representations.
I realize that it was pretty much the intention behind HTML + CSS. But HTML + CSS stopped being that a very, very long time ago. 30 years have passed. By now, we should have a little better sense of what's needed to write & render most texts.
> I mean, does somebody really compose text in html?!
Yes.
I use HTML the way people use markdown: as an open, easy to read, easy to write, plain text format for taking notes, writing articles, etc.
I find this quite intuitive and easy – partly because I’m an old-school web-developer from days of yore and I have HTML deeply internalised; partly because I use the abbreviated version of HTML noted above; and partly because I use a VIM plugin called Emmet which allows you to construct complex HTML fragments with a basic shorthand.
The reason why I use HTML instead of markdown is threefold.
* The first is that simple HTML, written with a little care, is readable as-is, and requires no transformation to see it looking pretty (just open in a browser). Markdown requires pandoc to turn it into something else.
* The second is that it is a semantically rich language, full of useful tags for expressing document structure and context for words and sentences. I find Markdown really confining.
* The third reason is that, if I take the care to fill in the basic author/keyword/desc meta-tags I can run scripts over my directories looking for and indexing things. Who cares Search Engines don’t use some of those tags anymore. I do.
Possibly they’re not entirely compelling reasons for anyone else to adopt HTML over markdown, but they work for me.
> it just really feels like there shouldn't be that many additional features. A couple more of emphasis options, a couple less ways to do the same thing
There was a language like that once, it was called HTML. It had very basic set of features initially, but then someone needed text to blink, someone needed to display videos, someone needed to send forms, someone needed to use it to play games and here we are today, and it's not done yet. If it was implemented today, you will get exactly same result in near future, because everyone's "small set of features" together adds to infinity.
We broke a weird little markup language into something it was never meant to be because the last tower of crap got too high and collapsed on itself.
Now a webpage is html+css+javascript+a dozen frameworks. People are sick of it and want something better. Well HTML2 is better. Just HTML2, nothing else.
"But html isn't style-agnostic" yes it is. CSS isn't style-agnostic. Instead of a markdown browser, how about a browser with a fixed stylesheet and no js? You don't even need a browser for that, that could just be a userscript that gets plugged into an existing browser. It'd break non-compliant websites that require javascript or custom css, but so would a markdown browser. Most people wouldn't write content for it, but most people wouldn't write content for a markdown browser either.
"But html is cluttered" it doesn't have to be. This is a valid webpage:
Personally, I prefer writing in markdown, but that's no reason to insert a markdown renderer into browsers. HTML can already be as sleek and readable as you want. If we added a new type of markup for anybody with a personal preference, we'd never stop.