Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I gave up on protobufs years ago. The protobuf team has no idea how to write PHP and JS libraries. I got segfaults from using the PHP extension. The built-in toJSON would return invalid JSON (missing braces for binary types). Ridiculous stuff.

I really just prefer to use JSON for everything. It's much easier to debug and observe traffic (browser Network tab). I like JSON-RPC, very simple spec (basically one page long). I don't like REST.

All that said, I'm really glad to see the community take things into their own hands.



It's much easier to debug and observe traffic (browser Network tab).

The DX for JSON things is much better. The UX for protobufs is much better (faster, less data over the wire, etc). Which you optimize for is up to you, but there isn't a straightforward "Use this tech because it's the best one."


> faster, less data over the wire, etc.

I've always wondered about this. Firstly, I'm fairly sure clientside JSON parsing is significantly faster than protobuf decoding but even data over the wire: JSON can be pretty compressible so surely the gains here are going to be marginal. Surely never enough benefits to UX to warrant the DX trade off, right?


protobuf parsing is far faster- it's a binary protocol. The underlying code is highly optimized and has to handle about 1/10th the total bytes. In computer, reducing memory access is often the best way to optimize.

PB can always be decoded to a text representation if you need to inspect it.


JS __is__ dumb at handling binary. The overhead is significant. The first thing to do when optimizing a Nodejs program is always replace loops that iterate through individual byte of binary with some native(wasm?) equivalent. JSON on the other hand isn't affected by this overhead. Because JSON.parse is a native method on every platform.

I once doing a mixing of two buffers that contains PCM. A simple task that take two number, average and put into another buffer. The native implementation is about 10x fast than the one I wrote with JS (Or consume 10X less cpu time).

A native Protobuf is definitely going to beat a native JSON implementation. A JS Protobuf if also likely to beat a JS JSON implementation.

But a JS Protobuf to native JSON? I doubt.


Do you have any links showing protobuf is faster?

There's nothing in your comment that hadn't already been said before by sibling commenters but as far as I've seen in the real world JSON appears to be faster in practice. Which is all that counts.

Yours and the many other commenters making the same assumption (it's binary ergo it must be fast) make a really good case for PB's adoption being rooted in theoretical assumptions rather than real-world benefit.

I get it. It makes sense that it should be faster. Nothing is self evident though. You gotta measure it.


I don't really get your attitude here. In particular, I'm not disagreeing that JSON parsers in the browser could be currently faster than protocol buffers.

I'm saying that computer science and hardware dictate that protocol buffers are faster for a wide range of reasons. That part's not in question- smaller data encodings have better cache use, and require far fewer dictionary (hash table) lookups at parse time, as well as far length work parsing strings. If you want to argue against my point there I don't know what to say.

If it was a priority to write a blindingly fast protocol buffer parser in JS, it's almost certain than an expert could write a faster one than a similar JSON parser.


> I don't really get your attitude here

My attitude here is that I made a specific observation: JSON is likely to be faster or at least negligibly slower in browsers in practice.

Everyone is replying either with theoretical speed comparisons or server-to-server non-JavaScript benchmarks, which don't seem relevant to my very specific observation I made up top...


If you had bothered to benchmark it, you'd have realized that lots of protobuf libraries are actually surprisingly slow.


> In our tests, it was demonstrated that this protocol performed up to 6 times faster than JSON.

https://auth0.com/blog/beating-json-performance-with-protobu...


The 6 times faster benchmark from that article is describing a Java server and Java client.

This thread is about protobuf vs JSON in a JavaScript environment.

The article you linked _does_ talk about JavaScript environments, too, but the numbers are much less impressive.


Json parsing is orders of magnitude slower than protobuf decoding.


I did some brief googling after reading your comment and I did find one article showing clientside protobuf being faster than JSON[0]. However they didn't isolate parsing - the only thing they measure is total request time to a Java spring application, so the JSON slowdown will include the Java JSON serialisation overhead as well as the request size/network overhead. My instinct is that these two will heavily favour protobuf making the JSON parse still likely to be faster.

It also shows a difference of 388ms (protobuf) vs 396ms (JSON) which is pretty negligible. Certainly not orders of magnitude.

Do you have other sources?

[0] https://auth0.com/blog/beating-json-performance-with-protobu...


Oh come on... how can one assume a binary somehow TLV-encoded format is not faster than parsing strings (generall json schemaless btw, the dynamicity also adds on top, while yes, protobuf also has variable sized containers). It is like you would claim parsing a string to an int is having no overhead over a straight int (yes I know proto ufs still require the varint decoding, still huge difference).

It id also not only the speed but also size is usually a magnitude off (and no, compression doesn't cut it and trades size again for computation).

Sure, if size and speed do not matter it is strange that you had considered protobuf at all.. but claiming they are never needed just means you have never been to resource constrained systems?

What you cite there, I assume most of that 400ms has nothing to do with the message encoding at all btw..


(a) You're making assumptions based on rule of thumb, I'm talking about real world usage: your points make sense in theory but don't necessarily reflect reality

(b) I'm talking about a narrow & specific case. PB may outperform JSON in most cases but I'm very specifically referring to browsers where JSON is native (& highly optimised) whereas PB is provided by a selection of open source libraries written in javascript. So that domain heavily favours JSON perf-wise.


> You're making assumptions based

No, not at all... coming from embedded where apeed, memory size and also bandwidth did count, json was actually.not just worse, but just wouldn't have been feasible (because our protobufs already barely fit memory and MTU constraints).


One important thing to consider with JSON is that a lot of people really, really care about JSON performance -- optimsing parsing in assembler, and rewriting internal datastructures just to make serialising + deserialising JSON faster.

I'm sure given two implementations of equal quality protobuf would easily outperform JSON, but I can also believe the JSON implementation in (for example) v8 is very, very hard to beat.



I just benchmarked it on my computer -- the protobuf is twice as fast (well, 1.8x), which is good, but I don't think I'd use that as a basis for choosing the technology I use.

Of course, I might use protobuf because I prefer it in my code to JSON, and it certainly is faster (if only twice).


Have you stepped through protobuf processing code? There's a lot of special cases, ifs, branches here and there. Protobufs within protobufs. Its not like its a size, then 100 floats packed together, theres more overhead than youd think. (Not to mention the client side allocations etc etc) I use protoc compiled to wasm for protobufs and it is fast, but theres a lot of wasm overhead to execute that code.

Json parsing is also a lot of special cases, error testing, but the v8 team has spent a huge amount of time optimising json parsing (theres a few blog posts on it). Im not assuming either way, but it's definitely as cut and dry as one would assume.


Stepped through? Yes..as I hinted, coming from an embedded environment, and measured compared highly optimized json parsing code (that even had much limitations, like very limited nesting, no lists) vs nanopb => clear winner on all points (memory reqs, performance, encoded size) - which is really not that surprising?


There are two ways to encode a repeated field (100 floats, but could also be any size up to the limits of repeteating fields): "Ordinary (not packed) repeated fields emit one record for every element of the field." That means type, value, type, value, etc"

However, "packed" fields are exactly a length followed by a byte array of the typed data. This was an oversight in original proto2 which is unlikely to be corrected, but packed the default in proto3.


100 (or any N) floats prefixed by a size is exactly what you would get from `repeated float f = 1 [packed=true];`


They didn't assume, you did. They showed some real data and you reacted emotionally.


If there's a JSON parser faster than a PB parser (for the same underlying data content) it just means the JSON parser was optimized more. By every rule in computing, PB parsing is far faster than JSON for every use case for a simple reason: the messages use less RAM, and therefore, moving the data into the processor and decoding it takes less time.


Theoretical performance doesn't matter in UX, only real world. Yes conceptually it's possible to make protobuffs faster than json, but someone still has to build that. Fast native json parsers already exist, that's the benchmark protobuffs has to beat significantly to make the worse DX worth it.


I believe the answer is „it depends”: https://medium.com/aspecto/protobuf-js-vs-json-stringify-per....


yes, sure it depends on the implementation, as the poster above said. You need to compare similarly optimized implementstions.. but really: no surprise?!?


How can JavaScript code (PB decoder) be faster than native code (JSON parser)?


Much, much less processing to do. Most of pb decoding is just reading bytes until you fill your data structure.


It's protocol 101, pb is a binary protocol with known schema so of course it has to be faster than json for encoding/decoding. Now it does not means that it's going to be faster all the time, it depends of the maturity of the library / language but on paper yes it is faster.


> it does not means that it's going to be faster all the time, it depends of the maturity of the library / language

I feel like I'm having to repeat myself a lot here as noone seems to have read the original comment correctly: we're talking about one specific language in one specific known environment here. Noone is claiming that JSON outperforms PB in general: only that it does in browsers, where it's actually relevant for UX.


It’s relevant for UX throughout the entire stack.

Where I’m working now, we have a REST API for users to interact with and every call behind the scenes is proto. As we deal with quite large objects, the benefits of avoiding repeated serialization and deserialization add up quickly.

From the user’s perspective we have a performant app, and much of this is possible due to proto.


Thank you. Finally someone answered my original question.

So it sounds like the trade-off can be worthwhile in some cases: particularly for large objects where serialisation is a significant serverside bottleneck.

I'm curious: you say PB helps avoid "repeated serialisation/deserialisation": how? In my mind, architecting an app that uses JSON/PB on the wire serialisation happens once on output & deserialisation happens once on input. For both transfer formats. Surely you wouldn't be passing massive json strings around your app in memory?

Also curious which is the bigger bottleneck for your large objects: input or output. How large is large?



First two links are Go, so not relevant to client-side.

Third link is also server-side, but since it's NodeJS it's at least close enough / more relevant to client-side perf.

Here's the benchmark from the third link:

    benchmark        time (avg)             (min … max)
    ---------------------------------------------------
    encode-JSON  342.37 µs/iter   (311.93 µs … 1.19 ms)
    decode-JSON   435.9 µs/iter   (384.44 µs … 1.41 ms)
    encode-PB    946.43 µs/iter   (777.38 µs … 3.13 ms)
    decode-PB    770.79 µs/iter   (688.99 µs … 1.78 ms)
    encode-PBJS  696.75 µs/iter   (618.43 µs … 2.43 ms)
    decode-PBJS  455.36 µs/iter   (413.66 µs … 1.09 ms)

showing JSON to be significantly faster


ahh yea, i'm not sure why the rest of my comment didn't upload. i was going to say that i thought the common use case for protobufs was to more ergonomically communicate between microservices?

in any case, that's the only time i've ever seen it used in production. the first link is a go benchmark that i felt represented why someone would use it for those purposes, the second was linked to show that despite numerous (successful!) attempts to make deserializing/serializing data faster and smaller, JSON is still the most heavily used and i would wager it's mostly due to how easy it is to use as far as browsers are concerned. the third was a link to justify that claim and show that js-land is much, much different than go-land as far as proto's and JSON encoding/decoding are concerned!


Java to Java uncompressed in that article is 6x faster per that article.

So yeah not a whole order of magnitude. I was using my experience as a guide where JSON parsing is a huge compute hog and Protobuf is not.

I've never experimented w/ Javascript or compression or any of the other things in that article, I guess YMMV.


I specifically referred to clientside in my original comment, so not talking about java to java.

Clientside is always going to be the pertinent metric for UX since it's processed on the user's device.


at least in the frontend (without WASM), it depends.

a few months ago i tested https://github.com/mapbox/pbf and while it was faster for deep/complex structs vs an unoptimized/repetative JSON blob, it was much slower at shallow structs and flat arrays of stuff. if you spend a bit of time to encode stuff as flat arrays to avoid mem alloc, JSON parsing wins by a lot since it goes through highly optimized C or assembly, while decoding protobuf in the JS JIT does not.

of course it's not always feasible to make optimized over-the-wire JSON structs if you have a huge/complex API that can return many shapes of complex structs.


At pbf speeds, decoding is usually no longer a bottleneck, but bandwidth might be when comparing with gzipped JSON. Also, one of the primary advantages of pbf is being able to decode partially and lazily (and adjust how things are decoded at low level), which is very important in use cases like vector maps.


> At pbf speeds, decoding is usually no longer a bottleneck, but bandwidth might be when comparing with gzipped JSON.

we were streaming a few hundred float datapoints spread across a dozen(ish) flat arrays over websocket at 20-40hz and needed to decode the payload eagerly. plain JSON was a multi-factor speedup over pbf for this case. but it's fully possible i was holding it wrong, too!

even when your "bottleneck" is rendering/rasterization (10ms), but your data pipe takes 3ms instead of 1ms, it's a big effect on framerate, battery, thermals, etc.

i'm a big fan of your work! while i have you here, would you mind reviewing this sometime soon? ;)

https://github.com/mourner/flatbush/pull/44


protobufs have a great property of having a schema (and then generating code). Which means that it's pretty easy to setup a system where accidental change of API fails CI tests for mobile apps and web.

This is doable with JSON, but I've never seen a JSON based setup actually work well at catching these kind of regressions.


OpenAPI?


Assuming your developer time is contained improved DX often also leads to better UX (more features). So even if you are optimizing for UX you may well be better with JSON.


also leads to better UX (more features)

More features is not a measure of better UX. In many cases (most cases!?) it's the opposite.


Sorry; I meant more polished features as much as more by count.


I don't develop in JS so can't comment on DX there, but I've found the DX to be pretty good when using protobuf in other languages.

That's mostly been down to having IDE autocompletion for data structures and fields once the protobuf code's been generated.

For many JSON APIs I've worked with there's only been human readable documentation, making them more error prone to work with (e.g. having to either craft JSON manually for requests, or writing a client library if one doesn't already exist).


There's also msgpack. Best of both worlds.


So does that make GraphQL the best then? JSON + faster/less data over the wire.


Not when you count the DX of the backend developers. Good luck making a performant GraphQL backend that doesn't suffer the N+1 problem, and have fun whitelisting the GraphQL queries produced by your frontend, because attackers will be supplying their own queries with no regards to performance.


Best experience I had with GraphQL was a B2B app where we had a fair amount of users, as well as the "backoffice" app also powered by GraphQL. Bad users we could just ban (the user base were great folks but could barely operate a computer, so it was fine).

Backend was with Absinthe+Elixir, so it was great (if I had to do it again today I would instead use Liveview, this was in 2017 where I had to retrofit a React app into something useable).

Public user facing is a different story, the last major one I saw was Tableau, though they are also business facing where they can just ban bad users. Github also has deprecated their GraphQL endpoints[0].

[0] https://github.blog/changelog/2022-08-18-deprecation-notice-...


Re: GitHub, that deprecation notice appears to be for GitHub Packages specifically. I don't see a deprecation notice on the general API: https://docs.github.com/en/graphql


> Bad users we could just ban

To be fair, it sounds like that would just make the DX wonderful no matter which stack you were using?


GraphQL has a DataLoader (to avoid N+1) and query complexity utilities to avoid those issues.


I know. Good luck implementing it performantly while also considering filtering, pagination, etc. It's doable of course, just not nearly as easy as people like to make it sound.


GraphQL isn’t magically faster. The equivalent endpoint in rest will be faster as you won’t need to translate the query to your backend persistence. GraphQLs benefits are not execution speed.


> JSON + faster

Only if you have a very competent backend team, who, apart from dataloader, will have to figure out caching.

> /less data

Graphql responses tend to be pretty deeply nested.


Apollo's Federation makes caching much easier to reason about as you can now selectively cache sub-query pieces at the service level for that specific responsible subgraph.


I think protobuf really works well on the backend and specifically with compiled languages like Go or C++ as per seen by the usage at Google and adoption of gRPC for Go based cloud tooling. Beyond that it's a huge failure. The generated code and usage for other languages is not idiomatic. In fact it's a hindrance and you can see that by the lack of adoption except by the largest orgs who are enforcing it using some sort of grpc-web bridge with types for the frontend. Ultimately you can just convert proto to OpenApi specs and do a much better job at custom client libs with that.

I'm not a frontend dev. Most of my time was spent on the backend but what I'll say is I much prefer the fluidity and dynamic nature of JavaScript and the built in ability to deal with JSON that naturally become objects. All the type stuff is easy to do but with docs you can get away with not needing it.

My feeling. Protobuf lives on for gRPC server side stuff but for everywhere else OpenApi is winning.


It's worth checking out our take on a lot of these problems: https://buf.build/blog/connect-web-protobuf-grpc-in-the-brow...


Yea I'm aware of that. I wish you guys the best of luck. I tried a lot of this with Micro. I think it's the right direction especially if you can simplify the tooling. The hard part is just the adoption curve but I think you have a lot of funding to find your way through that.


JSON parsing is a minefield, especially in cross-platforms scenarios (language and/or library). You won't encounter those problems on toy project or simple CRUD applications. For example, as soon as you deal with (u)int64 where values are greater than 2^53, a simple round-trip to javascript can wreak silent havoc.

See http://seriot.ch/projects/parsing_json.html

Protobuf support for google's first-class citizen languages is usually very good, i.e. C++, Java, Python and Go. For other languages, it depends on each implementation.


Though you're not wrong, in what common cases are integers larger than 2^53 required?


Timestamps in nanoseconds is one.


That's not common, JS's built in Date doesn't even support nanoseconds.


I guess it depends in which domain you work? In system programming, "clock_gettime" gives you nanoseconds. If you work with GPS timestamps, you have nanoseconds.

Could it be that JS's Date doesn't support nanoseconds because it cannot represent them, which is the issue we are talking about here?

Don't get me wrong, I understand this is not something that everyone uses every day, but to me it's a pretty straightforward example that can happen in a wide range of situations. It certainly happened to me/colleagues several times in several companies.


Nice article


As always, each protocol/data format has it's place. You need to maximize the amount of data you send in each packet? Then protobuf is better than JSON. Need to support large amount of clients without any fuzz? Then JSON is better. Wanna pass around data you don't know the schema of? JSON again.

Contexts matters, there is no silver bullets, everything has trade offs and so on, and so on.


JSON messages in a compressed websocket stream are surprisingly tiny. Bigger than compressed protobuf packets but not by much, and much smaller than uncompressed protobuf packets.


Yeah, which is probably fine in most cases but sometimes not (maybe the overhead is just 1.5x, but if you're doing thousands of messages per second (not the usual API<>browser communication for web users)) and then it matters. Again it's trade-offs and highly contextual.


Honestly, gzipped json is likely much smaller than uncompressed protobuf.

If you were going to use a binary protocol, why choose one that has no partial parsing/toc these days. There are much better alternatives IMO (flatbuffers being one of them)


> Honestly, gzipped json is likely much smaller than uncompressed protobuf.

Likely not. See here for a comparison: https://nilsmagnus.github.io/post/proto-json-sizes/

Btw, binary formats can also be compressed though it typically won't yield the same compression ratio as similar json would since there will be less repeation in the binary format.


Or, we could have done a comparison with large strings and see the opposite result. Silly benchmark is silly (or should I say, specific)


> Wanna pass around data you don't know the schema of? JSON again.

This is a false flag. If you don't know the schema on the receiving (or sending, for that matter) side, then you can't do anything with the data, other than pass it on. If you _do_ know what it looks like, then it has an implicit schema whether you call it a schema or not.


At the time, we needed interop with C. So that's why we chose protobufs. But it was a nightmare to work with in other languages. Including C++ for cross platform desktop apps where cross compiling became a problem too.

JSON in C is unfortunately way harder than in other modern languages (e.g. Go which makes it a breeze with struct tags and a great stdlib).


Surely the technical requirements of my specific use case are applicable to any use case.


The problem I see with JSON is its limited set of “native” types. I really wish it had specified support for proper numeric types (int, uint, various widths) and not just doubles. A timestamp type would be great as well.

What I really like about Protocol Buffers is that you must write a schema to get started. No more JSON.stringify anything. Everything else sucks though.


I think we could remove about a quarter of all Javascript programming time if JSON had a native Date type.


Hi there, I am the primary maintainer of the PHP library as of the last few years. I have heard that there used to be a lot of crashes; the code was almost completely rewritten in 2020 and is in a much better state now. If you find a segfault and you have a repro, file a bug and we will fix it.


I recommend Capnproto. Parsing time is zero, you can pretend you're a Microsoft programmer in the early 90s and just use the in-RAM struct as your wire format. Maybe it doesn't make sense for in-browser JS applications (though WASM is a different story) but for IPC and RPC in the general case, all parsing and unparsing does is generate waste heat.

ALWAYS favor a binary format unless you have a really good reason otherwise.


Capnproto is designed by Kenton, a former Google engineer who did a lot of work with protobufs at Google. I see Capnproto as the spiritual successor of protobuf, fixing many issues in protobufs.

Also, Capnproto is quite extensively used in some Cloudflare products.


I like protobufs but I was also disappointed at the JS protobuf options. I disliked both the JS object representation and RPC transport.

grpc-web in particular requires an Envoy proxy which seems absurdly heavyweight. I ended up using Twirp because Buf connect wasn't yet released or planned.

I rolled my own JS representation. The major differences from Connect:

- Avoid undefined if the message is not present on the wire and use an empty instance of the object instead. For recursive types, find the minimal set of fields to initialize as undefined instead of empty.

- Transparently promote some protobuf types, like google.protobuf.Timestamp to a proper Instant type (from js-joda or similar library). This makes a surprisingly large difference on reducing the number of jumps from the UI to the API.


What about tRPC?


I would use tRPC if I used TypeScript in the backend. But I use PHP, so it's not viable.


your problem is that you're using PHP


Bad take. Modern PHP is great.


lmao




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: