I gave up on protobufs years ago. The protobuf team has no idea how to write PHP...

onion2k · on Oct 31, 2022

It's much easier to debug and observe traffic (browser Network tab).

The DX for JSON things is much better. The UX for protobufs is much better (faster, less data over the wire, etc). Which you optimize for is up to you, but there isn't a straightforward "Use this tech because it's the best one."

lucideer · on Oct 31, 2022

> faster, less data over the wire, etc.

I've always wondered about this. Firstly, I'm fairly sure clientside JSON parsing is significantly faster than protobuf decoding but even data over the wire: JSON can be pretty compressible so surely the gains here are going to be marginal. Surely never enough benefits to UX to warrant the DX trade off, right?

dekhn · on Oct 31, 2022

protobuf parsing is far faster- it's a binary protocol. The underlying code is highly optimized and has to handle about 1/10th the total bytes. In computer, reducing memory access is often the best way to optimize.

PB can always be decoded to a text representation if you need to inspect it.

mmis1000 · on Oct 31, 2022

JS __is__ dumb at handling binary. The overhead is significant. The first thing to do when optimizing a Nodejs program is always replace loops that iterate through individual byte of binary with some native(wasm?) equivalent. JSON on the other hand isn't affected by this overhead. Because JSON.parse is a native method on every platform.

I once doing a mixing of two buffers that contains PCM. A simple task that take two number, average and put into another buffer. The native implementation is about 10x fast than the one I wrote with JS (Or consume 10X less cpu time).

A native Protobuf is definitely going to beat a native JSON implementation. A JS Protobuf if also likely to beat a JS JSON implementation.

But a JS Protobuf to native JSON? I doubt.

lucideer · on Oct 31, 2022

Do you have any links showing protobuf is faster?

There's nothing in your comment that hadn't already been said before by sibling commenters but as far as I've seen in the real world JSON appears to be faster in practice. Which is all that counts.

Yours and the many other commenters making the same assumption (it's binary ergo it must be fast) make a really good case for PB's adoption being rooted in theoretical assumptions rather than real-world benefit.

I get it. It makes sense that it should be faster. Nothing is self evident though. You gotta measure it.

dekhn · on Oct 31, 2022

I don't really get your attitude here. In particular, I'm not disagreeing that JSON parsers in the browser could be currently faster than protocol buffers.

I'm saying that computer science and hardware dictate that protocol buffers are faster for a wide range of reasons. That part's not in question- smaller data encodings have better cache use, and require far fewer dictionary (hash table) lookups at parse time, as well as far length work parsing strings. If you want to argue against my point there I don't know what to say.

If it was a priority to write a blindingly fast protocol buffer parser in JS, it's almost certain than an expert could write a faster one than a similar JSON parser.

lucideer · on Nov 1, 2022

> I don't really get your attitude here

My attitude here is that I made a specific observation: JSON is likely to be faster or at least negligibly slower in browsers in practice.

Everyone is replying either with theoretical speed comparisons or server-to-server non-JavaScript benchmarks, which don't seem relevant to my very specific observation I made up top...

yencabulator · on Nov 1, 2022

If you had bothered to benchmark it, you'd have realized that lots of protobuf libraries are actually surprisingly slow.

afavour · on Oct 31, 2022

> In our tests, it was demonstrated that this protocol performed up to 6 times faster than JSON.

https://auth0.com/blog/beating-json-performance-with-protobu...

cldellow · on Oct 31, 2022

The 6 times faster benchmark from that article is describing a Java server and Java client.

This thread is about protobuf vs JSON in a JavaScript environment.

The article you linked _does_ talk about JavaScript environments, too, but the numbers are much less impressive.

beastman82 · on Oct 31, 2022

Json parsing is orders of magnitude slower than protobuf decoding.

lucideer · on Oct 31, 2022

I did some brief googling after reading your comment and I did find one article showing clientside protobuf being faster than JSON[0]. However they didn't isolate parsing - the only thing they measure is total request time to a Java spring application, so the JSON slowdown will include the Java JSON serialisation overhead as well as the request size/network overhead. My instinct is that these two will heavily favour protobuf making the JSON parse still likely to be faster.

It also shows a difference of 388ms (protobuf) vs 396ms (JSON) which is pretty negligible. Certainly not orders of magnitude.

Do you have other sources?

[0] https://auth0.com/blog/beating-json-performance-with-protobu...

throw827474737 · on Oct 31, 2022

Oh come on... how can one assume a binary somehow TLV-encoded format is not faster than parsing strings (generall json schemaless btw, the dynamicity also adds on top, while yes, protobuf also has variable sized containers). It is like you would claim parsing a string to an int is having no overhead over a straight int (yes I know proto ufs still require the varint decoding, still huge difference).

It id also not only the speed but also size is usually a magnitude off (and no, compression doesn't cut it and trades size again for computation).

Sure, if size and speed do not matter it is strange that you had considered protobuf at all.. but claiming they are never needed just means you have never been to resource constrained systems?

What you cite there, I assume most of that 400ms has nothing to do with the message encoding at all btw..

lucideer · on Oct 31, 2022

(a) You're making assumptions based on rule of thumb, I'm talking about real world usage: your points make sense in theory but don't necessarily reflect reality

(b) I'm talking about a narrow & specific case. PB may outperform JSON in most cases but I'm very specifically referring to browsers where JSON is native (& highly optimised) whereas PB is provided by a selection of open source libraries written in javascript. So that domain heavily favours JSON perf-wise.

throw827474737 · on Nov 7, 2022

> You're making assumptions based

No, not at all... coming from embedded where apeed, memory size and also bandwidth did count, json was actually.not just worse, but just wouldn't have been feasible (because our protobufs already barely fit memory and MTU constraints).

CJefferson · on Oct 31, 2022

One important thing to consider with JSON is that a lot of people really, really care about JSON performance -- optimsing parsing in assembler, and rewriting internal datastructures just to make serialising + deserialising JSON faster.

I'm sure given two implementations of equal quality protobuf would easily outperform JSON, but I can also believe the JSON implementation in (for example) v8 is very, very hard to beat.

jeffbee · on Oct 31, 2022

https://github.com/protobufjs/protobuf.js#performance

CJefferson · on Oct 31, 2022

I just benchmarked it on my computer -- the protobuf is twice as fast (well, 1.8x), which is good, but I don't think I'd use that as a basis for choosing the technology I use.

Of course, I might use protobuf because I prefer it in my code to JSON, and it certainly is faster (if only twice).

soylentgraham · on Oct 31, 2022

Have you stepped through protobuf processing code? There's a lot of special cases, ifs, branches here and there. Protobufs within protobufs. Its not like its a size, then 100 floats packed together, theres more overhead than youd think. (Not to mention the client side allocations etc etc) I use protoc compiled to wasm for protobufs and it is fast, but theres a lot of wasm overhead to execute that code.

Json parsing is also a lot of special cases, error testing, but the v8 team has spent a huge amount of time optimising json parsing (theres a few blog posts on it). Im not assuming either way, but it's definitely as cut and dry as one would assume.

throw827474737 · on Nov 7, 2022

Stepped through? Yes..as I hinted, coming from an embedded environment, and measured compared highly optimized json parsing code (that even had much limitations, like very limited nesting, no lists) vs nanopb => clear winner on all points (memory reqs, performance, encoded size) - which is really not that surprising?

dekhn · on Oct 31, 2022

There are two ways to encode a repeated field (100 floats, but could also be any size up to the limits of repeteating fields): "Ordinary (not packed) repeated fields emit one record for every element of the field." That means type, value, type, value, etc"

However, "packed" fields are exactly a length followed by a byte array of the typed data. This was an oversight in original proto2 which is unlikely to be corrected, but packed the default in proto3.

jeffbee · on Oct 31, 2022

100 (or any N) floats prefixed by a size is exactly what you would get from `repeated float f = 1 [packed=true];`

walls · on Oct 31, 2022

They didn't assume, you did. They showed some real data and you reacted emotionally.

dekhn · on Oct 31, 2022

If there's a JSON parser faster than a PB parser (for the same underlying data content) it just means the JSON parser was optimized more. By every rule in computing, PB parsing is far faster than JSON for every use case for a simple reason: the messages use less RAM, and therefore, moving the data into the processor and decoding it takes less time.

patmorgan23 · on Oct 31, 2022

Theoretical performance doesn't matter in UX, only real world. Yes conceptually it's possible to make protobuffs faster than json, but someone still has to build that. Fast native json parsers already exist, that's the benchmark protobuffs has to beat significantly to make the worse DX worth it.

rad_gruchalski · on Nov 1, 2022

I believe the answer is „it depends”: https://medium.com/aspecto/protobuf-js-vs-json-stringify-per....

throw827474737 · on Nov 7, 2022

yes, sure it depends on the implementation, as the poster above said. You need to compare similarly optimized implementstions.. but really: no surprise?!?

AlmostAnyone · on Oct 31, 2022

How can JavaScript code (PB decoder) be faster than native code (JSON parser)?

Aeolun · on Oct 31, 2022

Much, much less processing to do. Most of pb decoding is just reading bytes until you fill your data structure.

Thaxll · on Oct 31, 2022

It's protocol 101, pb is a binary protocol with known schema so of course it has to be faster than json for encoding/decoding. Now it does not means that it's going to be faster all the time, it depends of the maturity of the library / language but on paper yes it is faster.

lucideer · on Oct 31, 2022

> it does not means that it's going to be faster all the time, it depends of the maturity of the library / language

I feel like I'm having to repeat myself a lot here as noone seems to have read the original comment correctly: we're talking about one specific language in one specific known environment here. Noone is claiming that JSON outperforms PB in general: only that it does in browsers, where it's actually relevant for UX.

nostrebored · on Oct 31, 2022

It’s relevant for UX throughout the entire stack.

Where I’m working now, we have a REST API for users to interact with and every call behind the scenes is proto. As we deal with quite large objects, the benefits of avoiding repeated serialization and deserialization add up quickly.

From the user’s perspective we have a performant app, and much of this is possible due to proto.

lucideer · on Oct 31, 2022

Thank you. Finally someone answered my original question.

So it sounds like the trade-off can be worthwhile in some cases: particularly for large objects where serialisation is a significant serverside bottleneck.

I'm curious: you say PB helps avoid "repeated serialisation/deserialisation": how? In my mind, architecting an app that uses JSON/PB on the wire serialisation happens once on output & deserialisation happens once on input. For both transfer formats. Surely you wouldn't be passing massive json strings around your app in memory?

Also curious which is the bigger bottleneck for your large objects: input or output. How large is large?

dwmbt · on Oct 31, 2022

My own search results:

[0] https://github.com/boguslaw-wojcik/encoding-benchmarks [1] https://github.com/alecthomas/go_serialization_benchmarks [2] https://techsparx.com/nodejs/datastore/protocol-buffers.html

lucideer · on Oct 31, 2022

First two links are Go, so not relevant to client-side.

Third link is also server-side, but since it's NodeJS it's at least close enough / more relevant to client-side perf.

Here's the benchmark from the third link:

    benchmark        time (avg)             (min … max)
    ---------------------------------------------------
    encode-JSON  342.37 µs/iter   (311.93 µs … 1.19 ms)
    decode-JSON   435.9 µs/iter   (384.44 µs … 1.41 ms)
    encode-PB    946.43 µs/iter   (777.38 µs … 3.13 ms)
    decode-PB    770.79 µs/iter   (688.99 µs … 1.78 ms)
    encode-PBJS  696.75 µs/iter   (618.43 µs … 2.43 ms)
    decode-PBJS  455.36 µs/iter   (413.66 µs … 1.09 ms)

showing JSON to be significantly faster

dwmbt · on Oct 31, 2022

ahh yea, i'm not sure why the rest of my comment didn't upload. i was going to say that i thought the common use case for protobufs was to more ergonomically communicate between microservices?

in any case, that's the only time i've ever seen it used in production. the first link is a go benchmark that i felt represented why someone would use it for those purposes, the second was linked to show that despite numerous (successful!) attempts to make deserializing/serializing data faster and smaller, JSON is still the most heavily used and i would wager it's mostly due to how easy it is to use as far as browsers are concerned. the third was a link to justify that claim and show that js-land is much, much different than go-land as far as proto's and JSON encoding/decoding are concerned!

beastman82 · on Oct 31, 2022

Java to Java uncompressed in that article is 6x faster per that article.

So yeah not a whole order of magnitude. I was using my experience as a guide where JSON parsing is a huge compute hog and Protobuf is not.

I've never experimented w/ Javascript or compression or any of the other things in that article, I guess YMMV.

lucideer · on Oct 31, 2022

I specifically referred to clientside in my original comment, so not talking about java to java.

Clientside is always going to be the pertinent metric for UX since it's processed on the user's device.

leeoniya · on Oct 31, 2022

at least in the frontend (without WASM), it depends.

a few months ago i tested https://github.com/mapbox/pbf and while it was faster for deep/complex structs vs an unoptimized/repetative JSON blob, it was much slower at shallow structs and flat arrays of stuff. if you spend a bit of time to encode stuff as flat arrays to avoid mem alloc, JSON parsing wins by a lot since it goes through highly optimized C or assembly, while decoding protobuf in the JS JIT does not.

of course it's not always feasible to make optimized over-the-wire JSON structs if you have a huge/complex API that can return many shapes of complex structs.

mourner · on Oct 31, 2022

At pbf speeds, decoding is usually no longer a bottleneck, but bandwidth might be when comparing with gzipped JSON. Also, one of the primary advantages of pbf is being able to decode partially and lazily (and adjust how things are decoded at low level), which is very important in use cases like vector maps.

leeoniya · on Nov 1, 2022

> At pbf speeds, decoding is usually no longer a bottleneck, but bandwidth might be when comparing with gzipped JSON.

we were streaming a few hundred float datapoints spread across a dozen(ish) flat arrays over websocket at 20-40hz and needed to decode the payload eagerly. plain JSON was a multi-factor speedup over pbf for this case. but it's fully possible i was holding it wrong, too!

even when your "bottleneck" is rendering/rasterization (10ms), but your data pipe takes 3ms instead of 1ms, it's a big effect on framerate, battery, thermals, etc.

i'm a big fan of your work! while i have you here, would you mind reviewing this sometime soon? ;)

https://github.com/mourner/flatbush/pull/44

izacus · on Oct 31, 2022

protobufs have a great property of having a schema (and then generating code). Which means that it's pretty easy to setup a system where accidental change of API fails CI tests for mobile apps and web.

This is doable with JSON, but I've never seen a JSON based setup actually work well at catching these kind of regressions.

janejeon · on Oct 31, 2022

OpenAPI?

ZiiS · on Oct 31, 2022

Assuming your developer time is contained improved DX often also leads to better UX (more features). So even if you are optimizing for UX you may well be better with JSON.

onion2k · on Oct 31, 2022

also leads to better UX (more features)

More features is not a measure of better UX. In many cases (most cases!?) it's the opposite.

ZiiS · on Oct 31, 2022

Sorry; I meant more polished features as much as more by count.

nlnn · on Oct 31, 2022

I don't develop in JS so can't comment on DX there, but I've found the DX to be pretty good when using protobuf in other languages.

That's mostly been down to having IDE autocompletion for data structures and fields once the protobuf code's been generated.

For many JSON APIs I've worked with there's only been human readable documentation, making them more error prone to work with (e.g. having to either craft JSON manually for requests, or writing a client library if one doesn't already exist).

mike_hock · on Oct 31, 2022

There's also msgpack. Best of both worlds.

halfmatthalfcat · on Oct 31, 2022

So does that make GraphQL the best then? JSON + faster/less data over the wire.

counttheforks · on Oct 31, 2022

Not when you count the DX of the backend developers. Good luck making a performant GraphQL backend that doesn't suffer the N+1 problem, and have fun whitelisting the GraphQL queries produced by your frontend, because attackers will be supplying their own queries with no regards to performance.

thejosh · on Oct 31, 2022

Best experience I had with GraphQL was a B2B app where we had a fair amount of users, as well as the "backoffice" app also powered by GraphQL. Bad users we could just ban (the user base were great folks but could barely operate a computer, so it was fine).

Backend was with Absinthe+Elixir, so it was great (if I had to do it again today I would instead use Liveview, this was in 2017 where I had to retrofit a React app into something useable).

Public user facing is a different story, the last major one I saw was Tableau, though they are also business facing where they can just ban bad users. Github also has deprecated their GraphQL endpoints[0].

[0] https://github.blog/changelog/2022-08-18-deprecation-notice-...

foobazgt · on Oct 31, 2022

Re: GitHub, that deprecation notice appears to be for GitHub Packages specifically. I don't see a deprecation notice on the general API: https://docs.github.com/en/graphql

janejeon · on Oct 31, 2022

> Bad users we could just ban

To be fair, it sounds like that would just make the DX wonderful no matter which stack you were using?

halfmatthalfcat · on Oct 31, 2022

GraphQL has a DataLoader (to avoid N+1) and query complexity utilities to avoid those issues.

counttheforks · on Oct 31, 2022

I know. Good luck implementing it performantly while also considering filtering, pagination, etc. It's doable of course, just not nearly as easy as people like to make it sound.

philliphaydon · on Oct 31, 2022

GraphQL isn’t magically faster. The equivalent endpoint in rest will be faster as you won’t need to translate the query to your backend persistence. GraphQLs benefits are not execution speed.

azangru · on Oct 31, 2022

> JSON + faster

Only if you have a very competent backend team, who, apart from dataloader, will have to figure out caching.

> /less data

Graphql responses tend to be pretty deeply nested.

halfmatthalfcat · on Oct 31, 2022

Apollo's Federation makes caching much easier to reason about as you can now selectively cache sub-query pieces at the service level for that specific responsible subgraph.

asim · on Oct 31, 2022

I think protobuf really works well on the backend and specifically with compiled languages like Go or C++ as per seen by the usage at Google and adoption of gRPC for Go based cloud tooling. Beyond that it's a huge failure. The generated code and usage for other languages is not idiomatic. In fact it's a hindrance and you can see that by the lack of adoption except by the largest orgs who are enforcing it using some sort of grpc-web bridge with types for the frontend. Ultimately you can just convert proto to OpenApi specs and do a much better job at custom client libs with that.

I'm not a frontend dev. Most of my time was spent on the backend but what I'll say is I much prefer the fluidity and dynamic nature of JavaScript and the built in ability to deal with JSON that naturally become objects. All the type stuff is easy to do but with docs you can get away with not needing it.

My feeling. Protobuf lives on for gRPC server side stuff but for everywhere else OpenApi is winning.

bufbuild · on Oct 31, 2022

It's worth checking out our take on a lot of these problems: https://buf.build/blog/connect-web-protobuf-grpc-in-the-brow...

asim · on Oct 31, 2022

Yea I'm aware of that. I wish you guys the best of luck. I tried a lot of this with Micro. I think it's the right direction especially if you can simplify the tooling. The hard part is just the adoption curve but I think you have a lot of funding to find your way through that.

fsaintjacques · on Oct 31, 2022

JSON parsing is a minefield, especially in cross-platforms scenarios (language and/or library). You won't encounter those problems on toy project or simple CRUD applications. For example, as soon as you deal with (u)int64 where values are greater than 2^53, a simple round-trip to javascript can wreak silent havoc.

See http://seriot.ch/projects/parsing_json.html

Protobuf support for google's first-class citizen languages is usually very good, i.e. C++, Java, Python and Go. For other languages, it depends on each implementation.

RedShift1 · on Oct 31, 2022

Though you're not wrong, in what common cases are integers larger than 2^53 required?

Leherenn · on Oct 31, 2022

Timestamps in nanoseconds is one.

RedShift1 · on Nov 2, 2022

That's not common, JS's built in Date doesn't even support nanoseconds.

Leherenn · on Nov 2, 2022

I guess it depends in which domain you work? In system programming, "clock_gettime" gives you nanoseconds. If you work with GPS timestamps, you have nanoseconds.

Could it be that JS's Date doesn't support nanoseconds because it cannot represent them, which is the issue we are talking about here?

Don't get me wrong, I understand this is not something that everyone uses every day, but to me it's a pretty straightforward example that can happen in a wide range of situations. It certainly happened to me/colleagues several times in several companies.

arein3 · on Oct 31, 2022

Nice article

capableweb · on Oct 31, 2022

As always, each protocol/data format has it's place. You need to maximize the amount of data you send in each packet? Then protobuf is better than JSON. Need to support large amount of clients without any fuzz? Then JSON is better. Wanna pass around data you don't know the schema of? JSON again.

Contexts matters, there is no silver bullets, everything has trade offs and so on, and so on.

speedgoose · on Oct 31, 2022

JSON messages in a compressed websocket stream are surprisingly tiny. Bigger than compressed protobuf packets but not by much, and much smaller than uncompressed protobuf packets.

capableweb · on Oct 31, 2022

Yeah, which is probably fine in most cases but sometimes not (maybe the overhead is just 1.5x, but if you're doing thousands of messages per second (not the usual API<>browser communication for web users)) and then it matters. Again it's trade-offs and highly contextual.

_eojb · on Oct 31, 2022

Honestly, gzipped json is likely much smaller than uncompressed protobuf.

If you were going to use a binary protocol, why choose one that has no partial parsing/toc these days. There are much better alternatives IMO (flatbuffers being one of them)

oll3 · on Oct 31, 2022

> Honestly, gzipped json is likely much smaller than uncompressed protobuf.

Likely not. See here for a comparison: https://nilsmagnus.github.io/post/proto-json-sizes/

Btw, binary formats can also be compressed though it typically won't yield the same compression ratio as similar json would since there will be less repeation in the binary format.

_eojb · on Nov 14, 2022

Or, we could have done a comparison with large strings and see the opposite result. Silly benchmark is silly (or should I say, specific)

maccard · on Oct 31, 2022

> Wanna pass around data you don't know the schema of? JSON again.

This is a false flag. If you don't know the schema on the receiving (or sending, for that matter) side, then you can't do anything with the data, other than pass it on. If you _do_ know what it looks like, then it has an implicit schema whether you call it a schema or not.

francislavoie · on Oct 31, 2022

At the time, we needed interop with C. So that's why we chose protobufs. But it was a nightmare to work with in other languages. Including C++ for cross platform desktop apps where cross compiling became a problem too.

JSON in C is unfortunately way harder than in other modern languages (e.g. Go which makes it a breeze with struct tags and a great stdlib).

depr · on Oct 31, 2022

Surely the technical requirements of my specific use case are applicable to any use case.

fuzzy2 · on Oct 31, 2022

The problem I see with JSON is its limited set of “native” types. I really wish it had specified support for proper numeric types (int, uint, various widths) and not just doubles. A timestamp type would be great as well.

What I really like about Protocol Buffers is that you must write a schema to get started. No more JSON.stringify anything. Everything else sucks though.

robertlagrant · on Oct 31, 2022

I think we could remove about a quarter of all Javascript programming time if JSON had a native Date type.

haberman · on Oct 31, 2022

Hi there, I am the primary maintainer of the PHP library as of the last few years. I have heard that there used to be a lot of crashes; the code was almost completely rewritten in 2020 and is in a much better state now. If you find a segfault and you have a repro, file a bug and we will fix it.

bitwize · on Oct 31, 2022

I recommend Capnproto. Parsing time is zero, you can pretend you're a Microsoft programmer in the early 90s and just use the in-RAM struct as your wire format. Maybe it doesn't make sense for in-browser JS applications (though WASM is a different story) but for IPC and RPC in the general case, all parsing and unparsing does is generate waste heat.

ALWAYS favor a binary format unless you have a really good reason otherwise.

kccqzy · on Oct 31, 2022

Capnproto is designed by Kenton, a former Google engineer who did a lot of work with protobufs at Google. I see Capnproto as the spiritual successor of protobuf, fixing many issues in protobufs.

Also, Capnproto is quite extensively used in some Cloudflare products.

sa46 · on Oct 31, 2022

I like protobufs but I was also disappointed at the JS protobuf options. I disliked both the JS object representation and RPC transport.

grpc-web in particular requires an Envoy proxy which seems absurdly heavyweight. I ended up using Twirp because Buf connect wasn't yet released or planned.

I rolled my own JS representation. The major differences from Connect:

- Avoid undefined if the message is not present on the wire and use an empty instance of the object instead. For recursive types, find the minimal set of fields to initialize as undefined instead of empty.

- Transparently promote some protobuf types, like google.protobuf.Timestamp to a proper Instant type (from js-joda or similar library). This makes a surprisingly large difference on reducing the number of jumps from the UI to the API.

tough · on Oct 31, 2022

What about tRPC?

francislavoie · on Oct 31, 2022

I would use tRPC if I used TypeScript in the backend. But I use PHP, so it's not viable.

artursapek · on Oct 31, 2022

your problem is that you're using PHP

francislavoie · on Oct 31, 2022

Bad take. Modern PHP is great.

artursapek · on Nov 2, 2022