Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's much easier to debug and observe traffic (browser Network tab).

The DX for JSON things is much better. The UX for protobufs is much better (faster, less data over the wire, etc). Which you optimize for is up to you, but there isn't a straightforward "Use this tech because it's the best one."



> faster, less data over the wire, etc.

I've always wondered about this. Firstly, I'm fairly sure clientside JSON parsing is significantly faster than protobuf decoding but even data over the wire: JSON can be pretty compressible so surely the gains here are going to be marginal. Surely never enough benefits to UX to warrant the DX trade off, right?


protobuf parsing is far faster- it's a binary protocol. The underlying code is highly optimized and has to handle about 1/10th the total bytes. In computer, reducing memory access is often the best way to optimize.

PB can always be decoded to a text representation if you need to inspect it.


JS __is__ dumb at handling binary. The overhead is significant. The first thing to do when optimizing a Nodejs program is always replace loops that iterate through individual byte of binary with some native(wasm?) equivalent. JSON on the other hand isn't affected by this overhead. Because JSON.parse is a native method on every platform.

I once doing a mixing of two buffers that contains PCM. A simple task that take two number, average and put into another buffer. The native implementation is about 10x fast than the one I wrote with JS (Or consume 10X less cpu time).

A native Protobuf is definitely going to beat a native JSON implementation. A JS Protobuf if also likely to beat a JS JSON implementation.

But a JS Protobuf to native JSON? I doubt.


Do you have any links showing protobuf is faster?

There's nothing in your comment that hadn't already been said before by sibling commenters but as far as I've seen in the real world JSON appears to be faster in practice. Which is all that counts.

Yours and the many other commenters making the same assumption (it's binary ergo it must be fast) make a really good case for PB's adoption being rooted in theoretical assumptions rather than real-world benefit.

I get it. It makes sense that it should be faster. Nothing is self evident though. You gotta measure it.


I don't really get your attitude here. In particular, I'm not disagreeing that JSON parsers in the browser could be currently faster than protocol buffers.

I'm saying that computer science and hardware dictate that protocol buffers are faster for a wide range of reasons. That part's not in question- smaller data encodings have better cache use, and require far fewer dictionary (hash table) lookups at parse time, as well as far length work parsing strings. If you want to argue against my point there I don't know what to say.

If it was a priority to write a blindingly fast protocol buffer parser in JS, it's almost certain than an expert could write a faster one than a similar JSON parser.


> I don't really get your attitude here

My attitude here is that I made a specific observation: JSON is likely to be faster or at least negligibly slower in browsers in practice.

Everyone is replying either with theoretical speed comparisons or server-to-server non-JavaScript benchmarks, which don't seem relevant to my very specific observation I made up top...


If you had bothered to benchmark it, you'd have realized that lots of protobuf libraries are actually surprisingly slow.


> In our tests, it was demonstrated that this protocol performed up to 6 times faster than JSON.

https://auth0.com/blog/beating-json-performance-with-protobu...


The 6 times faster benchmark from that article is describing a Java server and Java client.

This thread is about protobuf vs JSON in a JavaScript environment.

The article you linked _does_ talk about JavaScript environments, too, but the numbers are much less impressive.


Json parsing is orders of magnitude slower than protobuf decoding.


I did some brief googling after reading your comment and I did find one article showing clientside protobuf being faster than JSON[0]. However they didn't isolate parsing - the only thing they measure is total request time to a Java spring application, so the JSON slowdown will include the Java JSON serialisation overhead as well as the request size/network overhead. My instinct is that these two will heavily favour protobuf making the JSON parse still likely to be faster.

It also shows a difference of 388ms (protobuf) vs 396ms (JSON) which is pretty negligible. Certainly not orders of magnitude.

Do you have other sources?

[0] https://auth0.com/blog/beating-json-performance-with-protobu...


Oh come on... how can one assume a binary somehow TLV-encoded format is not faster than parsing strings (generall json schemaless btw, the dynamicity also adds on top, while yes, protobuf also has variable sized containers). It is like you would claim parsing a string to an int is having no overhead over a straight int (yes I know proto ufs still require the varint decoding, still huge difference).

It id also not only the speed but also size is usually a magnitude off (and no, compression doesn't cut it and trades size again for computation).

Sure, if size and speed do not matter it is strange that you had considered protobuf at all.. but claiming they are never needed just means you have never been to resource constrained systems?

What you cite there, I assume most of that 400ms has nothing to do with the message encoding at all btw..


(a) You're making assumptions based on rule of thumb, I'm talking about real world usage: your points make sense in theory but don't necessarily reflect reality

(b) I'm talking about a narrow & specific case. PB may outperform JSON in most cases but I'm very specifically referring to browsers where JSON is native (& highly optimised) whereas PB is provided by a selection of open source libraries written in javascript. So that domain heavily favours JSON perf-wise.


> You're making assumptions based

No, not at all... coming from embedded where apeed, memory size and also bandwidth did count, json was actually.not just worse, but just wouldn't have been feasible (because our protobufs already barely fit memory and MTU constraints).


One important thing to consider with JSON is that a lot of people really, really care about JSON performance -- optimsing parsing in assembler, and rewriting internal datastructures just to make serialising + deserialising JSON faster.

I'm sure given two implementations of equal quality protobuf would easily outperform JSON, but I can also believe the JSON implementation in (for example) v8 is very, very hard to beat.



I just benchmarked it on my computer -- the protobuf is twice as fast (well, 1.8x), which is good, but I don't think I'd use that as a basis for choosing the technology I use.

Of course, I might use protobuf because I prefer it in my code to JSON, and it certainly is faster (if only twice).


Have you stepped through protobuf processing code? There's a lot of special cases, ifs, branches here and there. Protobufs within protobufs. Its not like its a size, then 100 floats packed together, theres more overhead than youd think. (Not to mention the client side allocations etc etc) I use protoc compiled to wasm for protobufs and it is fast, but theres a lot of wasm overhead to execute that code.

Json parsing is also a lot of special cases, error testing, but the v8 team has spent a huge amount of time optimising json parsing (theres a few blog posts on it). Im not assuming either way, but it's definitely as cut and dry as one would assume.


Stepped through? Yes..as I hinted, coming from an embedded environment, and measured compared highly optimized json parsing code (that even had much limitations, like very limited nesting, no lists) vs nanopb => clear winner on all points (memory reqs, performance, encoded size) - which is really not that surprising?


There are two ways to encode a repeated field (100 floats, but could also be any size up to the limits of repeteating fields): "Ordinary (not packed) repeated fields emit one record for every element of the field." That means type, value, type, value, etc"

However, "packed" fields are exactly a length followed by a byte array of the typed data. This was an oversight in original proto2 which is unlikely to be corrected, but packed the default in proto3.


100 (or any N) floats prefixed by a size is exactly what you would get from `repeated float f = 1 [packed=true];`


They didn't assume, you did. They showed some real data and you reacted emotionally.


If there's a JSON parser faster than a PB parser (for the same underlying data content) it just means the JSON parser was optimized more. By every rule in computing, PB parsing is far faster than JSON for every use case for a simple reason: the messages use less RAM, and therefore, moving the data into the processor and decoding it takes less time.


Theoretical performance doesn't matter in UX, only real world. Yes conceptually it's possible to make protobuffs faster than json, but someone still has to build that. Fast native json parsers already exist, that's the benchmark protobuffs has to beat significantly to make the worse DX worth it.


I believe the answer is „it depends”: https://medium.com/aspecto/protobuf-js-vs-json-stringify-per....


yes, sure it depends on the implementation, as the poster above said. You need to compare similarly optimized implementstions.. but really: no surprise?!?


How can JavaScript code (PB decoder) be faster than native code (JSON parser)?


Much, much less processing to do. Most of pb decoding is just reading bytes until you fill your data structure.


It's protocol 101, pb is a binary protocol with known schema so of course it has to be faster than json for encoding/decoding. Now it does not means that it's going to be faster all the time, it depends of the maturity of the library / language but on paper yes it is faster.


> it does not means that it's going to be faster all the time, it depends of the maturity of the library / language

I feel like I'm having to repeat myself a lot here as noone seems to have read the original comment correctly: we're talking about one specific language in one specific known environment here. Noone is claiming that JSON outperforms PB in general: only that it does in browsers, where it's actually relevant for UX.


It’s relevant for UX throughout the entire stack.

Where I’m working now, we have a REST API for users to interact with and every call behind the scenes is proto. As we deal with quite large objects, the benefits of avoiding repeated serialization and deserialization add up quickly.

From the user’s perspective we have a performant app, and much of this is possible due to proto.


Thank you. Finally someone answered my original question.

So it sounds like the trade-off can be worthwhile in some cases: particularly for large objects where serialisation is a significant serverside bottleneck.

I'm curious: you say PB helps avoid "repeated serialisation/deserialisation": how? In my mind, architecting an app that uses JSON/PB on the wire serialisation happens once on output & deserialisation happens once on input. For both transfer formats. Surely you wouldn't be passing massive json strings around your app in memory?

Also curious which is the bigger bottleneck for your large objects: input or output. How large is large?



First two links are Go, so not relevant to client-side.

Third link is also server-side, but since it's NodeJS it's at least close enough / more relevant to client-side perf.

Here's the benchmark from the third link:

    benchmark        time (avg)             (min … max)
    ---------------------------------------------------
    encode-JSON  342.37 µs/iter   (311.93 µs … 1.19 ms)
    decode-JSON   435.9 µs/iter   (384.44 µs … 1.41 ms)
    encode-PB    946.43 µs/iter   (777.38 µs … 3.13 ms)
    decode-PB    770.79 µs/iter   (688.99 µs … 1.78 ms)
    encode-PBJS  696.75 µs/iter   (618.43 µs … 2.43 ms)
    decode-PBJS  455.36 µs/iter   (413.66 µs … 1.09 ms)

showing JSON to be significantly faster


ahh yea, i'm not sure why the rest of my comment didn't upload. i was going to say that i thought the common use case for protobufs was to more ergonomically communicate between microservices?

in any case, that's the only time i've ever seen it used in production. the first link is a go benchmark that i felt represented why someone would use it for those purposes, the second was linked to show that despite numerous (successful!) attempts to make deserializing/serializing data faster and smaller, JSON is still the most heavily used and i would wager it's mostly due to how easy it is to use as far as browsers are concerned. the third was a link to justify that claim and show that js-land is much, much different than go-land as far as proto's and JSON encoding/decoding are concerned!


Java to Java uncompressed in that article is 6x faster per that article.

So yeah not a whole order of magnitude. I was using my experience as a guide where JSON parsing is a huge compute hog and Protobuf is not.

I've never experimented w/ Javascript or compression or any of the other things in that article, I guess YMMV.


I specifically referred to clientside in my original comment, so not talking about java to java.

Clientside is always going to be the pertinent metric for UX since it's processed on the user's device.


at least in the frontend (without WASM), it depends.

a few months ago i tested https://github.com/mapbox/pbf and while it was faster for deep/complex structs vs an unoptimized/repetative JSON blob, it was much slower at shallow structs and flat arrays of stuff. if you spend a bit of time to encode stuff as flat arrays to avoid mem alloc, JSON parsing wins by a lot since it goes through highly optimized C or assembly, while decoding protobuf in the JS JIT does not.

of course it's not always feasible to make optimized over-the-wire JSON structs if you have a huge/complex API that can return many shapes of complex structs.


At pbf speeds, decoding is usually no longer a bottleneck, but bandwidth might be when comparing with gzipped JSON. Also, one of the primary advantages of pbf is being able to decode partially and lazily (and adjust how things are decoded at low level), which is very important in use cases like vector maps.


> At pbf speeds, decoding is usually no longer a bottleneck, but bandwidth might be when comparing with gzipped JSON.

we were streaming a few hundred float datapoints spread across a dozen(ish) flat arrays over websocket at 20-40hz and needed to decode the payload eagerly. plain JSON was a multi-factor speedup over pbf for this case. but it's fully possible i was holding it wrong, too!

even when your "bottleneck" is rendering/rasterization (10ms), but your data pipe takes 3ms instead of 1ms, it's a big effect on framerate, battery, thermals, etc.

i'm a big fan of your work! while i have you here, would you mind reviewing this sometime soon? ;)

https://github.com/mourner/flatbush/pull/44


protobufs have a great property of having a schema (and then generating code). Which means that it's pretty easy to setup a system where accidental change of API fails CI tests for mobile apps and web.

This is doable with JSON, but I've never seen a JSON based setup actually work well at catching these kind of regressions.


OpenAPI?


Assuming your developer time is contained improved DX often also leads to better UX (more features). So even if you are optimizing for UX you may well be better with JSON.


also leads to better UX (more features)

More features is not a measure of better UX. In many cases (most cases!?) it's the opposite.


Sorry; I meant more polished features as much as more by count.


I don't develop in JS so can't comment on DX there, but I've found the DX to be pretty good when using protobuf in other languages.

That's mostly been down to having IDE autocompletion for data structures and fields once the protobuf code's been generated.

For many JSON APIs I've worked with there's only been human readable documentation, making them more error prone to work with (e.g. having to either craft JSON manually for requests, or writing a client library if one doesn't already exist).


There's also msgpack. Best of both worlds.


So does that make GraphQL the best then? JSON + faster/less data over the wire.


Not when you count the DX of the backend developers. Good luck making a performant GraphQL backend that doesn't suffer the N+1 problem, and have fun whitelisting the GraphQL queries produced by your frontend, because attackers will be supplying their own queries with no regards to performance.


Best experience I had with GraphQL was a B2B app where we had a fair amount of users, as well as the "backoffice" app also powered by GraphQL. Bad users we could just ban (the user base were great folks but could barely operate a computer, so it was fine).

Backend was with Absinthe+Elixir, so it was great (if I had to do it again today I would instead use Liveview, this was in 2017 where I had to retrofit a React app into something useable).

Public user facing is a different story, the last major one I saw was Tableau, though they are also business facing where they can just ban bad users. Github also has deprecated their GraphQL endpoints[0].

[0] https://github.blog/changelog/2022-08-18-deprecation-notice-...


Re: GitHub, that deprecation notice appears to be for GitHub Packages specifically. I don't see a deprecation notice on the general API: https://docs.github.com/en/graphql


> Bad users we could just ban

To be fair, it sounds like that would just make the DX wonderful no matter which stack you were using?


GraphQL has a DataLoader (to avoid N+1) and query complexity utilities to avoid those issues.


I know. Good luck implementing it performantly while also considering filtering, pagination, etc. It's doable of course, just not nearly as easy as people like to make it sound.


GraphQL isn’t magically faster. The equivalent endpoint in rest will be faster as you won’t need to translate the query to your backend persistence. GraphQLs benefits are not execution speed.


> JSON + faster

Only if you have a very competent backend team, who, apart from dataloader, will have to figure out caching.

> /less data

Graphql responses tend to be pretty deeply nested.


Apollo's Federation makes caching much easier to reason about as you can now selectively cache sub-query pieces at the service level for that specific responsible subgraph.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: