Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Markdown is a convenient but deeply limited markup language with only a small subset of html's features. And yes, limitations are good because we want documents not web apps, etc, etc, but I mean "images can't have captions" limited, "navigation bars don't exist" limited. Actual important features of html don't exist in markdown, which is why almost every markdown platform ends up adding extensions and shortcodes. Why use markdown at all? Just use html.

"But html isn't style-agnostic" yes it is. CSS isn't style-agnostic. Instead of a markdown browser, how about a browser with a fixed stylesheet and no js? You don't even need a browser for that, that could just be a userscript that gets plugged into an existing browser. It'd break non-compliant websites that require javascript or custom css, but so would a markdown browser. Most people wouldn't write content for it, but most people wouldn't write content for a markdown browser either.

"But html is cluttered" it doesn't have to be. This is a valid webpage:

    <!doctype html>
    <title>Page Title</title>
    <h1>Page Title</h1>

    <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Ut ac lorem ut massa euismod vestibulum.

    <p>Nullam rutrum blandit eleifend. Aenean a varius diam.
    Morbi sodales velit nunc, vel vestibulum lorem tempus sodales.
Personally, I prefer writing in markdown, but that's no reason to insert a markdown renderer into browsers. HTML can already be as sleek and readable as you want. If we added a new type of markup for anybody with a personal preference, we'd never stop.


The fatal flaw of HTML (and XML for that matter) is that the tags have the same visual weight as the text they're delimiting, which makes for a sense of clutter even in your minimal example.

Markdown really scores here, by having a pleasing plain text representation as a goal from the outset, and I've love to see it used more widely for web pages.

I'd also love to see it more widely used for offline reading, too - the help files in an application really shouldn't need to invoke a web browser to view them when a lightweight markdown viewer would do the job. Not that there is a lightweight markdown viewer, mind you!


HTML is based on SGML, and SGML has short references to handle lightweight custom syntaxes. For example, you can define that an asterisk appearing in your content within a <p> element is replaced by <em>, and moreover define that an asterisk appearing within <em> content is replaced by </em>, toggling emphasized text tags. So SGML very much acknowledges the need for lightweight markup, but the SHORTREF feature, like everything else requiring markup declarations, didn't make it into the XML subset of SGML.

HTML itself doesn't have these and other features (such as basic text macros) because SGML was understood to be available at least at authoring time.


Why didn't we end up with SGML -> XML "compilers"?

As I understand it, XML is intended to be equivalent to SGML, with less syntactic flexibility to make it easier to parse. So once you've done the hard work of parsing SGML, it seems like it should be straightforward to emit the same data as XML for further machine processing.

Or are there some SGML features that cannot be represented with an equivalent in XML?


> Why didn't we end up with SGML -> XML "compilers"?

We did; both the osx command-line tool of the venerable SP/OpenSP package, as well as sgmlproc (sgmljs) with output_format=xml does exactly that: output canonical XML markup with shortrefs resolved, omitted tags inferred, attribute values put in quotes and attribute names preprended where not already present, conditional marked sections included or omitted depending on parameter entities, and also entity references expanded, etc. But SGML can also output HTML proper, unlike XML.

SGML mostly has additional authoring features over XML indeed, but a number of additional concepts as well: much more powerful notations (used as general extension mechanism such as for math or parametric macro expansion) and stylesheets ie. link process declarations with state-dependent assignment of attributes and pipelining to yield markup projections, transforms, and views.


Maybe I'm too young for all this, but that sure seems like something I've always wanted. Why aren't we using SGML for authoring HTML in 2022?


I was a big supporter of SGML-based languages: markup language written for humans to author.

However, the trend in computing in late 90s and early 2000s was to come up with more easily parsed languages, thus came things like XML: a mark-up language tuned for computers to produce.

But let's be honest here: parsing most XML can be done very simply, whereas supporting basic SGML was only possible with the OpenSP.

SGML is a specification of over 1000 pages of dense text, and that's before you get a language DTD on top of it (like the HTML or DocBook or TEI DTDs). Basically, it is too complex and too flexible, and it was too expensive to produce the tooling to support it (GUI editors, processing tools, making them performant...).

I mean, we are looking at MD here that is even less flexible than HTML: simplicity wins even if it only caters to 90% of the usecases!


I have been on this journey since html 4 was coming out next year, and I’ve never ever heard of SGML. Wow.

I can see how JSON would be a reaction to that.


HTML4 was the last of "HTML-is-an-SGML-application" (that was the terminology when you define a document type with a SGML DTD) attempt before XHTML 1.0 came out.

SGML is what allows implicit closing tags, for instance.

Of course, even XHTML failed because it was too strict and browsers couldn't trust websites with following it to the letter, so we ended up with HTML as of today: clearly coming out of both, but not really either of them anymore.


Me, too, having worked at IBM and used SGML there. But JSON is what really killed XML. It can be harder to read, especially at first, but it's shorter and fulfills all the same roles.


I wouldn't say JSON killed XML: it's still widely in use for documents whose type definition changes rarely and which are more content oriented. The one benefit to XML/SGML languages is that you've got simple, ubiquotious support for "attributes", plain text content and nested tree content.

I.e. to represent

  <p>An <acronym expanded="HyperText Markup Language">HTML</> page was the driver for interactive web.
in JSON, you have to come up with your own conventions for attributes and content:

  [
    {
     "type": "p", 
     "content": [
       {"type": "STRING", "value": "An "},
       {"type": "acronym", "attributes": {"expanded": "HyperText Markup Language"}, "content": [{"type": "STRING", "content": "HTML"}]},
       {"type": "STRING", "content": " page was the driver for interactive web."}
     ]
    }
  ]
I know which one I'd prefer ;)

So with JSON, everyone comes up with their own format. And in these cases that XML was designed for (to mark up textual content), it handily beats JSON in expressiveness, simplicity and terseness too. The fact that it was misused for defining protocols and objects (i.e. SOAP, ugh) is a different matter.

I would say that SGML/XML languages still have this benefit over even Markdown: any contextual modifier is either impossible or uses a one-off syntax (like images or links with text).


Indeed. And I do miss XSLT. There's nothing like that for JSON.

Being document oriented from its SGML roots, XML always had more markup, rendering, and search options than JSON.


Because back in 1995 - 2005, getting a site up and running quickly was more important than doing it right.

Cue (and queue) endless kludges to re-do parts of SGML, badly.


Both XML and HTML are implementations of SGML. SGML is the parent reference and includes things that aren't in either subset.

SGML was invented by IBM for "pubs," authoring documents electronically but shipping them as printed manuals.

Lacking the need for print (in HTML) and for display (in XML) lead to those versions.


It's said that the father of LISP, John McCarthy, lamented the W3C's choice of SGML as the basis for HTML : « An environment where the markup, styling and scripting is all s-expression based would be nice. »

The {lambda way} project could be an answer, small and simple: http://lambdaway.free.fr/lambdawalks/


Markdown is undoubtedly more readable, but HTML can be more readable than most people make it. And considering that the ultimate goal is to wind up with a layed-out, styled document, its capabilities in that regard are just plain-old more important, especially since markdown isn't going to replace WYSIWYG editors any time soon, and almost everybody who needs to know HTML can learn it relatively easily. Browsers collapse white space by default, so you've got a lot of flexibility with its formatting:

    <!doctype html>

    <title>
        Page Title
    </title>

    <h1>
        Page Title
    </h1>

    <p>
        Lorem ipsum dolor sit amet, consectetur
        adipiscing  elit. Ut ac lorem ut massa 
        euismod vestibulum.
    
    <p>
        Nullam rutrum blandit eleifend. Aenean a 
        varius diam. Morbi sodales velit nunc, vel 
        vestibulum lorem tempus sodales.
--or--

    <!doctype html>

    <title>    Page Title    </title>
    <h1>       Page Title    </h1>

    <p>    Lorem ipsum dolor sit amet, consectetur
           adipiscing  elit. Ut ac lorem ut massa 
           euismod vestibulum.

    <p>    Nullam rutrum blandit eleifend. Aenean a 
           varius diam. Morbi sodales velit nunc, vel 
           vestibulum lorem tempus sodales.

I get why many developers like this idea... Web developers are responsible for implementing the complex user-facing parts, and their primary weapon is text: doing extra work sucks, and when you're a hammer, everything looks like a nail. But developers are not designers, and design not being left to developers in mature organizations is no accident. Absolute, deliberate, limiting simplicity is always an attractive argument if you dismiss the value of, or maybe don't even understand the reason for the complexity. I won't deny the advantages of reader-view-level simplicity in web design: it's easier to visually parse, more performant, and easier to navigate compared to most web pages, similar to how books compare to magazines-- but about 225 million people per year in the US read magazines and I assure you most of them would not choose to have textural printouts in lieu of their current form. While people like having the option of a uniform, grey, easily visually parseable mode to view webpages, that's probably not what they want even most of the time, let alone as a deliberate limitation.


The problem is that this only really works well for "documents". Most webpages are anything but "documents", even those that do mostly focus on text.

Of course, HTML templating can help a lot with that: adding footers and sidebars and so on. But it's still no good for "web apps".


Compared to markdown I think HTML works a lot better as a general purpose format.


One problem with this style is that if you copy any text from this website you will have trailing spaces after each paragraph. To avoid that you have to close the tags (or open the next one) directly after the text.


Sure, if a trailing space in copied/paste functionality or precise :after placement is important then you'd need to modify the ending tag placement... but prioritizing that use case seems like a premature optimization. I don't think that makes a drastic difference. Compared to markdown, you've still got a heck of a lot more formatting flexibility without changing the rendered product.


I may be that rare exception - a hobbyist developer who does design work as part of $dayjob.


I don't think that's as rare as people say, especially in smaller organizations.

Having an art school design education and a bit over a decade in (mostly back-end) web development, I've had plenty of deseloper type roles. If they fall under a design or marketing department, they'll spend 80% of the time doing design work and try to throw it together on some shitty wysiwyg monstrosity, ignoring performance, stability, maintainability, etc. If they fall under technical departments, design, ahem, decoration and polish is something to be applied at the end, if there's time, after the real work is done. Either way, having the same group of people responsible for two halves of that coin rarely yields a good balance, and they almost never pay any real attention to usability ... at least not for use cases that don't exactly mirror their own. Seems to me that replacing the flexibility of current markup and styling tools with simple markdown and reader-type layouts is just trying to apply the tech-focused solution to the entire problem the way Flash tried to do the opposite.


> tags have the same visual weight as the text they're delimiting

IMO this issue should and can be easily solved by editor/viewer by rendering tags with lower contrast.


Yes, that's a perfectly viable workaround, but it's still a band-aid that requires expending resources that wouldn't need to be spent if the markup method had been better chosen for readability. (To be specific, I believe the angle-brackets are the main culprit.)


Technically XML has some machinery to support more lightweight notations. It won't parse these notations, of course, but the information is accessible to the users of XML reader. The mechanism should work like that:

    <?xml version="1.0"?>
    <!DOCTYPE myDoc [
      <!NOTATION markdown PUBLIC "https://authority.org/markdown/v1.23">
      <!NOTATION rest     PUBLIC "urn:restructured-text/v4.56">
      <!ELEMENT myDoc (note+)>
      <!ELEMENT note CDATA>
      <!ATTLIST note 
        notation NOTATION (markdown|rest) #REQUIRED>
      ]>
    <myDoc>
      <note notation="rest">
        restructured text goes here
      </note>
      <note notation="markdown">
        markdown goes here
      </note>
    </myDoc>


IMO the main issue with writing HTML is it takes a two-armed key-chord to do a < or > char.


Not entirely sure what you mean by "two-armed key-chord". It's shift-, or shift-. -- my keyboard's bottom line goes <shift>\ZXCVBNM,./<shift>. < is right-index and right-pinky, and > is right-index and right-pinky (as shift is so much wider)

Now sure, some are home row afficiandos, and having # on the home row is certainly beneficial to those as your right-index can stay on J as god intended

Or do you have a different keyboard layout to me. Keyboard layouts - especially the location of things like ,./<>?@;'#:@~[]{} vary a lot depending on the country you are in.


This is a great point.


For a static document markup language, Djot does a rather good job: https://github.com/jgm/djot

It's very similar yet much fuller-featured than commonmark, with support for definition lists, footnotes, tables, several new kinds of inline formatting (insert, delete, highlight, superscript, subscript), math, smart punctuation, attributes that can be applied to any element, and generic containers for block-level, inline-level, and raw content. In addition, it resolves ambiguities in the commonmark spec and parses in linear time with no backtracking.

Further discussion lower in this thread: https://news.ycombinator.com/item?id=33553293

Quickstart for Markdown users: https://github.com/jgm/djot/blob/main/doc/quickstart-for-mar...

Some more in-depth examples, showing how Djot would be rendered into HTML: https://htmlpreview.github.io/?https://github.com/jgm/djot/b...


Exactly what I was thinking, by omitting the <html> <head> and <body> HTML can be quite concise [1]. Additionally the closing </li> can be omitted from lists and <li> barely a step over using - for bullet points.

The worst part about HTML is the links, though. Anchor tags are awful. Having to repeatedly type <a href="..."> and closing with </a> is wayyy too boilerplate much for for something that is simply surrounded with [square](brackets) in markdown.

[1] I go to https://meiert.com/en/blog/optional-html/ for reference.


I have the opposite problem. HTML <a href> links are consistent with the rest of the language. <a href>Something</a> makes the same kind of sense as <em>something</em>.

But markdown? I'm always forgetting the order of the (link)[text] or [link](text) or [text](link) or (text)[link]. It's just something that's invented, and not consistent with the rest of itself.


And, for the specific syntax: parentheses to surround the URL is jut bad because parentheses are URL code points, so you can’t just insert regular serialised URLs in Markdown in all cases. (See https://news.ycombinator.com/item?id=33340097 for more explanation.)


The org-mode version is better in my opinion. Either [[link]] or [[link][text]].

Only uses square brackets and the optional text comes second which makes logical sense.


It's said that the father of LISP, John McCarthy, lamented the W3C's choice of SGML as the basis for HTML : « An environment where the markup, styling and scripting is all s-expression based would be nice. » The {lambda way} project could be an answer, small and simple: http://lambdaway.free.fr/lambdawalks/

In lambdatalk such a HTML code

    <h1>Page Title</h1>

    <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Ut ac lorem ut massa euismod vestibulum.

    <p>Nullam rutrum blandit eleifend. Aenean a varius diam.
    Morbi sodales velit nunc, vel vestibulum lorem tempus sodales.
is written like this

    _h1 Page Title

    _p Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut ac lorem ut massa euismod vestibulum.

    _p Nullam rutrum blandit eleifend. Aenean a varius diam.  Morbi sodales velit nunc, vel vestibulum lorem tempus sodales.
And you can also compute 3x4 writing {x 3 4} or compute the factorial of 100, compute a Fast Fourier Transform, draw complex graphics, ... it's a true programming language with a coherent syntax, unlike Markdown.


Write a list in HTML and write one in Markdown.

The difference in legibility is pronounced.

Especially if you don't want weird whitespace things.


But why does it matter what the underlying code looks like to the end user?

Nobody complains that the assembly code that runs their application is cluttered and illegible.


Yet people tend to choose python instead of assembly when introducing coding to a friend. It is easier to write, in part because it’s easier to read.


It doesn't.

But it matters to the author.

Assembler is not a popular language.


I don't find this illegible in the least.

    <ul>
      <li> Item one
      <li> Item two
      <li> Item three
      <li> Item four
    </ul>
It has advantages over markdown lists, too: You never need to mess with semantic indentation to add additional paragraphs to a given line, and you don't have to manually number ordered lists the way some markdown flavours ask you to.


Plain HTML would be great, its just browsers holding it back. Without CSS, it looks unacceptably ugly.


Agreed, classless css to the rescue, eg https://newcss.net/


You can get good looking "plain" html (i.e. readable margins, linespacing, fonts and text size) with a very tiny amount of CSS.


just hit reader mode


My browser doesn't have one, and even if it did the point is that it should look good without any effort.


I guess my point is browser all do easily provide (some with extensions) exactly this already - a mode where it's just HTML with some standard readable CSS already.

You can also do default user stylesheets.

This is a positive point for things being pretty well set up today.


Imho, reader mode should've been the default stylesheet from the very beginning.


> only a small subset of html's features.

On the contrary - Markdown is essentially a superset of HTML, so unless you're using a renderer that strips it from the input, you can have the best of both worlds.

This property was super useful for a lightweight CMS I threw together a few years and which is still used by the original customer today. 99% of what they need to render is easily authored in Markdown, and this further helps ensure a commonality of style and device portability.


The original markdown parser supported html because it was basically just a preprocessor that added some syntactic sugar to html. The proposal here isn't just "what if browsers had a markdown preprocessor" (although I also think that would be questionable), but "what if browsers limited content down to only markdown, so that the web was all just clean, style-agnostic documents," and that clearly requires that markdown not support arbitrary html.


Having re-read the article, i must say that this is another incorrect claim. It proposes no such thing. This is one straw man after another.

All it actually suggests is this:

> Let's have markdown rendering in all major browsers soon


That would be the worst world. I love that we have semantic and accessible elements and Markdown is pretty bad in both those categories.


Flavors of Markdown might be a superset, but as it is normally used, I don't think many would say that Markdown has all the abilities of HTML.


The original Markdown spec is very clear that HTML is allowed. So MD itself is absolutely a superset of HTML

https://daringfireball.net/projects/markdown/syntax#html

But in practice people mostly use MD variants, such as the "GitHub Flavored Markdown Spec" which may have some limits on HTML usage

https://github.github.com/gfm/#html-blocks


Uh, yeah, that's a valid web-page, but I don't see how that counters "html is cluttered" statement. This is cluttered. It… just is. I know some people who suffered some mind deformation in academia and now claim LaTeX is the perfect markup for blogs, but I don't think I've encountered the same for html until now. I mean, does somebody really compose text in html?!

Markdown is deeply limited, that's true, but I often think that there is just a tiny bit of syntax lacking to make it just fine. Some actually is implemented in software like Pandoc or RedCarpet, there are a couple of ways to make tables (some better than others), LaTeX can be employed for formulas, some implementations have checklists, strikethrough, etc. It's just poorly standartized — and the spirit of original proposal (and misleading name) is at fault here as well, since later attempts to invent a standard mean very little when there is a dozen of different common implementations and not a single one is reasonably complete.

By the way, the fact that HTML was supposed to serve as an addition to Markdown doesn't help: you just cannot allow people to submit arbitrary HTML everywhere where something like Markdown is needed. To use it in comments on a forum you need to fully parse it anyway, explicitly enabling or disabling different features of some ubiquitous "full implementation".

Obviously, you cannot make an atrocity like a modern landing page in Markdown+. But… ok, I shouldn't be judgemental and claim such atrocities shouldn't exist — they can, but most blogs, forums (such as this one), etc. — really could have been just "viewer programs" of some standardized format, much more restrictive than HTML+CSS+JS, but a little less limited than Markdown.

All of this isn't very much related to the original topic, but seriously, I dream of some better version of Markdown someday becoming a de-facto standard markup language for all forums, messengers, blogging engines, whatever the general name for Jira is… You know what I mean.

I don't really have a solution, it just really feels like there shouldn't be that many additional features. A couple more of emphasis options, a couple less ways to do the same thing (I mean, it's stupid to convert all of */-/+ to the same <li> elements), colors, better image embeddings (with captions), sidenotes, better ways to handle formulas (there are enough dedicated literals in Unicode to construct most simple formulas without the need for LaTeX, but they still need to be parsed to be rendered pretty) and simple UML-like stuff… I'm pretty sure the comprehensive list of features for 6-σ usecases cannot be THAT huge. Big, yes. Not endless. And most features surely have some "plain-text" (or very light special syntax) representations.

I realize that it was pretty much the intention behind HTML + CSS. But HTML + CSS stopped being that a very, very long time ago. 30 years have passed. By now, we should have a little better sense of what's needed to write & render most texts.


> I mean, does somebody really compose text in html?!

Yes.

I use HTML the way people use markdown: as an open, easy to read, easy to write, plain text format for taking notes, writing articles, etc.

I find this quite intuitive and easy – partly because I’m an old-school web-developer from days of yore and I have HTML deeply internalised; partly because I use the abbreviated version of HTML noted above; and partly because I use a VIM plugin called Emmet which allows you to construct complex HTML fragments with a basic shorthand.

The reason why I use HTML instead of markdown is threefold.

* The first is that simple HTML, written with a little care, is readable as-is, and requires no transformation to see it looking pretty (just open in a browser). Markdown requires pandoc to turn it into something else.

* The second is that it is a semantically rich language, full of useful tags for expressing document structure and context for words and sentences. I find Markdown really confining.

* The third reason is that, if I take the care to fill in the basic author/keyword/desc meta-tags I can run scripts over my directories looking for and indexing things. Who cares Search Engines don’t use some of those tags anymore. I do.

Possibly they’re not entirely compelling reasons for anyone else to adopt HTML over markdown, but they work for me.


> it just really feels like there shouldn't be that many additional features. A couple more of emphasis options, a couple less ways to do the same thing

There was a language like that once, it was called HTML. It had very basic set of features initially, but then someone needed text to blink, someone needed to display videos, someone needed to send forms, someone needed to use it to play games and here we are today, and it's not done yet. If it was implemented today, you will get exactly same result in near future, because everyone's "small set of features" together adds to infinity.


I've yet to have anyone explain why markdown with it's dozen flavours is better than HTML2: https://datatracker.ietf.org/doc/html/rfc1866

We broke a weird little markup language into something it was never meant to be because the last tower of crap got too high and collapsed on itself.

Now a webpage is html+css+javascript+a dozen frameworks. People are sick of it and want something better. Well HTML2 is better. Just HTML2, nothing else.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: