Zero-copy protobuf and ConnectRPC for Rust
Posted by PaulHoule 4 days ago
Comments
Comment by ethegwo 1 day ago
1. zero-copy means bytes are always inlined in the raw message buffer, which means the app should always access bytes by a reference/pointer
2. You cannot compress the RPC message, if you want to fully leverage the advantages from zero serdes/copy
3. RC itself
Comment by nopurpose 21 hours ago
Comment by stevefan1999 21 hours ago
Comment by nopurpose 1 day ago
Comment by brancz 1 day ago
Comment by secondcoming 1 day ago
Comment by nopurpose 1 day ago
Comment by nly 3 hours ago
It's the same for any other high performance decoding of TLV formats (FIX in finance for instance).
Comment by eklitzke 18 hours ago
In some message schemas even though this isn't truly zero copy it may be close to it in terms of actual overhead and CPU time, in other schemas it doesn't help at all.
Comment by jeffbee 23 hours ago
To a very close approximation you can say that the official protobuf C++ library always copies and owns strings.
Comment by secondcoming 22 hours ago
Even the decoder makes a copy even though it's returning a string_view? What's the point then.
I can understand encoders having to make copies, but not in a decoder.
Comment by akshayshah 20 hours ago
(For full disclosure, I started the ConnectRPC project - so of course I’m excited about that part of the announcement too.)
Comment by willvarfar 1 day ago
I have been on a similar odyssey making a 'zero copy' Java library that supports protobuf, parquet, thrift (compact) and (schema'd) json. It does allocate a long[] and break out the structure for O(1) access but doesn't create a big clump of object wrappers and strings and things; internally it just references a big pool buffer or the original byte[].
The speed demons use tail calls on rust and c++ to eat protobuf https://blog.reverberate.org/2021/04/21/musttail-efficient-i... at 2+GB/sec. In java I'm super pleased to be getting 4 cycles per touched byte and 500MB/sec.
Currently looking at how to merge a fast footer parser like this into the Apache Parquet Java project.
Comment by arianvanp 1 day ago
If this fixes that I might consider switching.
However, Google is also working in a new grpc-rust implementation and I have faith in them getting it right so holding tight a little bit longer.
Comment by gobdovan 1 day ago
I ended up building a protocol for my own use around a very strict subprocess boundary for Python (initially at least, protocol is meant to be universal). It has explicit payload shape, timeout and error semantics. I already went a little too far beyond my usecase with deterministic canonicalization for some common pitfall data types (I think pickle users would understand, though). It still needs some documentation polish, but if anyone would actually use it, I can document it properly and publish it.
Comment by codedokode 11 hours ago
Comment by mgaunard 1 day ago
Plain C structs that fit in a UDP datagram that you can reinterpret_cast from is still best. You can still provide schemas and UUIDs for that, and dynamically transcode to JSON or whatever.
Comment by bluGill 21 hours ago
Comment by codedokode 11 hours ago
Comment by pjc50 21 hours ago
- you agree never to care about endianness (can probably get away with this now)
- you don't want to represent anything complicated or variable length, including stringsComment by codedokode 11 hours ago
Comment by benterix 1 day ago
Comment by mgaunard 1 day ago
For topics which are sending the state of something, a gap naturally self-recovers so long as you keep sending the state even if it doesn't change.
For message buses that need to be incremental, you need to have a separate snapshot system to recover state. That's usually pretty rare outside of things like order books (I work in low-latency trading).
For requests/response, I find it's better to tell the requester their request was not received rather than transparently re-send it, since by the time you re-send it it might be stale already. So what I do at the protocol level is just have ack logic, but no retransmit. Also it's datagram-oriented rather than byte-oriented, so overall much nicer guarantees than TCP (so long as all your messages fit in one UDP payload).
Comment by codedokode 11 hours ago
TL;DR protobuf has version compatibility and compact number encoding.
Comment by nu11ptr 15 hours ago
Comment by codedokode 11 hours ago
Comment by sa46 6 hours ago
I had the same issue when looking to adopt ConnectRPC for Go, which uses a custom wrapper type to model requests.
Comment by secondcoming 1 day ago
Comment by jeffbee 1 day ago
Comment by rballpug 1 day ago
Comment by Futurmix 11 hours ago
Comment by slipknotfan 4 days ago
Comment by josephg 1 day ago
Comment by alfiedotwtf 1 day ago
If anything, there should be “less than blessed” “*-awesome” libraries
Comment by echelon 1 day ago
Didn't we learn this with python?
How many python http client libraries are in the dumping ground that is the python "batteries included" standard library?
And yet people always reach for the one that is outside stdlib.
Comment by kjuulh 1 day ago
It is not that everything should go into the stdlib, but having syn, procmacro and serde would be a good start imo. And like golang having a native http stack would be really awesome, every time you have to do any HTTP, you end up pulling in some c-based crypto lib, which can really mess up your day when you want to cross-compile. With golang it mostly just works.
It isn't really in the flavor of rust to do, so I don't think it is going to happen, but it is nice when building services, that you can avoid most dependencies.
Comment by bigfishrunning 22 hours ago
A second tier stdlib would turn out like the Boost c++ libraries -- an 800 lb gorilla of a common dependency that gets called in just to do something very simple; although to be fair most of the Boost functionality already is in rust's stdlib.
Comment by SAI_Peregrinus 21 hours ago
Comment by Valodim 1 day ago
Comment by arlort 22 hours ago
Comment by j1elo 1 day ago
Only when it falls short on my needs, I would drop the stdlib and go in dearch of a good quality, reputable, and reliable 3rd-party lib (which is easier said than done).
Has worked me well with Go and Python. I would enjoy the same with Rust. Or at a minimum, a list of libraries officialy curated and directly pointed at by the lang docs.