If you're serializing the JSON, that is UTF8. You're signing UTF8 bytes... don't...

Vendan · on July 24, 2019

SSB signs objects like so:

    {
      "previous": "%XphMUkWQtomKjXQvFGfsGYpt69sgEY7Y4Vou9cEuJho=.sha256",
      "author": "@FCX/tsDLpubCPKKfIrw4gc+SQkHcaD17s7GI6i/ziWY=.ed25519",
      "sequence": 2,
      "timestamp": 1514517078157,
      "hash": "sha256",
      "content": {
        "type": "post",
        "text": "Second post!"
      }
    }

gets signed like

    {
      "previous": "%XphMUkWQtomKjXQvFGfsGYpt69sgEY7Y4Vou9cEuJho=.sha256",
      "author": "@FCX/tsDLpubCPKKfIrw4gc+SQkHcaD17s7GI6i/ziWY=.ed25519",
      "sequence": 2,
      "timestamp": 1514517078157,
      "hash": "sha256",
      "content": {
        "type": "post",
        "text": "Second post!"
      },
      "signature": "z7W1ERg9UYZjNfE72ZwEuJF79khG+eOHWFp6iF+KLuSrw8Lqa6
                    IousK4cCn9T5qFa8E14GVek4cAMmMbjqDnAg==.sig.ed25519"
    }

which means that you have to be careful about how you remove the signature in order to verify the original. The "main" node implementation does this by parsing the json, removing the field, and then re serializing, forcing any alternate implementation to exactly match the node serialization in order to be compatible

ThrustVectoring · on July 25, 2019

I'd be really curious as to why they chose not to go with something like

    {
      "data": {...}
      "signature": "z7W1ER..."
    }

Vendan · on July 25, 2019

as far as I can tell, it's cause they are hardcore JS devs, where JSON is seen as almost a part of the language, and little to no effort was put into things like future-proofing. A lot of the failings have been fixed as the community grows, but this one is core to how the whole thing works, so changing it at this point is fairly non-trivial (would basically involve completely breaking any kind of backwards compat, and potentially even some forwards compat)

sytringy05 · on July 25, 2019

probably they did that, then Roy Fielding came in, unplugged their router and wouldn't let them back on the internet til they stopped polluting their resources with metadata

wyldfire · on July 24, 2019

Gee, I would just naively wonder whether the scope of the signature includes the outermost closing brace. Does it?

Vendan · on July 24, 2019

At least when I implemented it, it was the full message, as a complete valid json object. The signature would get added as another field, and then the object reserialized.

tracker1 · on July 24, 2019

This is why JWT is better... JWT signs "header.body" with each part as base64 from the JSON, and joined with a period. The content in the body and header are immaterial.

lvh · on July 24, 2019

It's not "better", it solves a materially different problem. The article already acknowledges that if you can afford to, you should just stick a tag on the outside. (That's what JWT does, but the actual thing recommended, just HMAC, is better still, for reasons mentioned elsewhere in the comments.)

juliusmusseau · on July 24, 2019

The one approach you missed on that blog post: put an envelope inside an envelope. Does anyone ever do this?

  {
    "original":"json",
    "goes":"here"
  }

Signed:

  {
  
  "original_json_contents_base64":"ewogICJvcmlnaW5hbCI6Impzb24iLAogICJnb2VzIjoiaGVyZSIKfQo=",

  
  "hmac_sha256_of_base64":"bf1f4cb95ce8633aff46888e1717873e32bb2a770b3d4b5b74a59e5e9adefeda"

  }

This way you have full control over the raw bytes you want to sign (by forcing them into Base64 where other systems can't get their dirty paws on them).

I guess the problem here is if intermediate systems want to do stuff based on the payload (but without validating it), they won't like this.

But if the problem is just intermediate systems barfing on non-json, this might work!

p.s. enjoyable blog post - as they always are! ;-)

lvh · on July 24, 2019

Yep! That works, but it's essentially the first option ("How to sign a JSON object") but with JSON as the outer serialization format instead of a comma in the middle.

You also correctly identified why that is different from the other schemes: they don't change the structure of the outer object.

Skunkleton · on July 25, 2019

You could also serialize the json with a placeholder string (All spaces or zeroes or something), calculate the HMAC, and substitute the string. You could then do that in reverse on the receiving end. The deserialization could easily note the offset of the hmac, which could then easily be verified against the original bytes.

lvh · on July 25, 2019

How is that distinct from the bait and switch trick in the post?

gritzko · on July 25, 2019

http://seriot.ch/parsing_json.php

Yep, JSON has no round-trip guarantees

steveklabnik · on July 24, 2019

JSON is not always guaranteed to be UTF-8. It wasn't updated to mandate that until 2017 https://www.tbray.org/ongoing/When/201x/2017/12/14/RFC-8259-...

That said, I certainly hope that it is, generally speaking.

tracker1 · on July 24, 2019

The original spec said UTF-8/16/32, but unaware of any reference implementation that used anything other than UTF-8 ... though, who knows with hand rolled crap, and windows in the mix.

Vendan · on July 24, 2019

https://tc39.es/ecma262/#sec-json.stringify

The stringify function returns a String in UTF-16 encoded JSON format representing an ECMAScript value, or undefined.

masklinn · on July 25, 2019

Python's `json` returns text, which you can encode however you want.

lvh · on July 24, 2019

That's true if you have the luxury of a traditional tag on the outside, but falls apart when you have systems where signatures are in-band, like SAML, which is where canonicalization shows up. (Both for UTF-8 itself, which isn't canonical, but especially for your serialization format e.g. XML/JSON, which is usually far hairier.)

tracker1 · on July 24, 2019

If you treat the original UTF8 as bytes, then you still have that as bytes... this is part of why JWT uses base64 around the JSON string's bytes. In any case, you don't create a signature on a string per se, you create it on bytes, even if those bytes are derived from a string. Also, you don't validate a signature by re-creating the bytes in question... that's a flawed approach, and not the approach for example JWT takes.

lvh · on July 24, 2019

> In any case, you don't create a signature on a string per se, you create it on bytes, even if those bytes are derived from a string.

Everyone is on the same page that at some point bytes go into a hash function. That's not the problem.

You mention JWT, but JWT only does external signing, which doesn't trigger the problematic case several people are describing to you. Perhaps an example would be more useful. If you start with a JSON like:

    {"a": 1}

how do you build a JSON like:

    {"a": 1, "tag": "deadbeefdeadbeefdeadbeef"}

with a signing and verification algorithm that works?

> Also, you don't validate a signature by re-creating the bytes in question... that's a flawed approach, and not the approach for example JWT takes.

Can you describe an HMAC validation process that doesn't involve recreating the bytes in the HMAC tag?

tracker1 · on July 24, 2019

You don't... you could create one like...

    {"body":"base64-json","tag":"hash"}

It seems to me that trying to do what you're saying is a flawed approach.

lvh · on July 24, 2019

That is literally the first thing suggested in the post. You can say the other thing is a "flawed approach" but that's the design problem being solved, so your answer is simply not responsive to the question.

Vendan · on July 24, 2019

Don't worry, multiple people (including me) have told the community in question that it's a flawed approach. Doesn't seem to have done much to fix it though... :shrug: