gRPC in Production
Liveblog by Beyang Liu (@beyang)
Alan Shreve is an hacker, entrepreneur and creator of ngrok.com. ngrok is the best way to connect expose services behind a NAT or firewall to the internet for demos, webhook development and IoT connectivity. Today, he's giving us a whirlwind tour of gRPC and how to use it in your production web service.
Q: How do Microservices talk to each other?
A: SOAP. Just kidding. Today, it's mostly HTTP + JSON.
I will die happy if I never write another REST client library in my life.
It's a lot of repetitive boilerplate.
Why do REST APIs suck?
- Streaming is difficult (nigh-impossible in some languages)
- Bi-directional streaming isn’t possible at all
- Operations are difficult to model (e.g. ‘restart the machine’)
- Inefficient (textual representations aren’t optimal for networks)
- Your internal services aren’t RESTful anyways, they’re just HTTP endpoints
- Hard to get many resources in a single request (GraphQL)
- No formal (machine-readable) API contract
- Corollary: writing client libraries requires humans
- Corollary: humans are expensive and don’t like writing client libraries
- Corollary: writing client libraries requires humans
So let's use gRPC to build a cache service
gRPC is a "high-performance open source universal RPC framework."
Let's build a caching service together using gRPC.
We don't define it in code. We actually define it in an Interface Definition Language (IDL), in this case, protobufs.
Here's our caching service (
We won't dive into the generated code itself, but let's see how we can use it.
Let's look at
Note: don't have to write networking code or serialization code.
Is this just WSDL all over again?
gRPC compared to SOAP/WSDL
- Inextricably tied to XML (grpc is pluggable)
- Very heavyweight service definition format: XML/XSD nightmare
- Unnecessarily complex, bloated with unnecessary features (Two- phase commit?!)
- Inflexible and intolerant of forward-compatibility (unlike protobuf)
- Performance, streaming not solved . . .
- Machine-readable API contracts are actually a really great idea
- Clients were responsible for generating libraries instead of vendors
gRPC compared to Swagger
- Solves the machine-readable contract problem, but none of the other problems with HTTP/JSON (performance, streaming, modeling)
- Swagger definitions are cumbersome and incredibly verbose. Compared to writing grpc protobuf definitions, they’re a gigantic pain
gRPC compared to Thrift
- Thrift actually a really great idea, very similar project goals
- Never achieved same ubiquity and ease of use. This is really hard. Requires all major language implementations to be:
- well documented
- highly performant
- easy to install
Implementing the methods
Let's fill in the stubs that gRPC generated for us:
There's a number of error codes each corresponding to types of errors. It's kind of like HTTP errors, but no response body.
There's a simple API for that. On the server:
On the client:
What's going on underneath the hood? How does it work?
protobuf serialized over HTTP/2:
- protobuf serialization (pluggable)
- Clients open one long-lived connection to a grpc server
- A new HTTP/2 stream for each RPC call
- Allows simultaneous in-flight RPC calls
- Allows client-side and server-side streaming
There are 3 implementations at the moment:
- Three high-performance event loop driven implementations
- Ruby, Python, node.js, PHP, C#, Objective-C, and C++ are all bindings to the “C core”
- PHP via PECL extension (apache or nginx/php-fpm)
- Netty + BoringSSL via JNI
- Pure Go implementation using Go stdlib crypto/tls package
Where did gRPC come from?
- Originally pioneered by a team at Google
- Next generation version of an internal Google project called ‘stubby’
- Now a F/OSS project with a completely open spec and
contributors from many companies
- Development is still primarily executed by Google devs
What if you have misbehaving clients
Let's try Multitenancy (associating client ID with each request):
"It's too slow"
Now someone tells you your service is too slow. And here you realize you have no visibility. What do you do?
Option 1: add logging:
Note: this is a lot of boilerplate. Turns out gRPC has something for that: the client interceptor. Every time you make a remote call, the interceptor middleware will be invoked.
On the server side, there's a server interceptor.
Now you look in the logs you and find you're still failing SLA. Some round-trips take 2.2 seconds (more than 2 seconds). Why? Your timeout only covers a portion of the full request/response roundtrip.
So let the client set its own timeout:
How does that interact with the timeout you set previously on the server? Simple: "the context propagates through."
Now let's say you want to call your service with a dry run flag. I want to run it without side effects. I want it to work on all mutable API.
This is simple with passing the right gRPC metadata (analogous to HTTP headers).
Let's add retry logic.
Idempotent logic? We can have safe retries:
But now failed operations are slow. What's going on?
In your response message, you add a new field that indicates whether it is possible to retry in the case of an error.
You really want a structured error, not just a code and string. You want a full object's worth of parameters.
Something like this:
Unfortunately, this gets a little messy... A lot of serialization and deserialization is required, since gRPC doesn't have response bodies.
This is one of the larger frustrations working with gRPC compared to HTTP.
Another feature request: Cache dump
Let's say you now want to add the ability to dump the contents of your cache service.
Now you run into non-enough memory errors. What defensive measures can you add?
Set max number of concurrent streams (simultaneous HTTP/2 streams per client):
gRPC also lets you use an InTap handler, a piece of code just like the server interceptor, but happens a little bit earlier in request lifecycle.
But what if that doesn't solve all your memory issues?
There's not enough time to go into detail, but one thing to note: The fact that you're establishing a single persistent connection means that every request you make goes to the same server.
So, you have to put the load-balancing logic in the client.
You can of course put this logic into a middleware server that does this for you, so the actual client doesn't have to worry about this. This is still pretty new, the spec is experimental.
gRPC clients in other languages
Using your service from Python:
- Load Balancing
- Structured error handling is unfortunate
- No support for browser JS
- Breaking API changes
- Poor documentation for some languages
- No standardization across languages
Where is gRPC used in production?
- ngrok — all 20+ internal services communicate via gRPC
- Square — replacement for all of their internal RPC. one of the very first adopters and contributors to gRPC.
- CoreOS — Production API for etcd v3 is entirely gRPC.
- Google — Production APIs for Google Cloud Services (e.g. PubSub, Speech Rec)
- Netflix, Yik Yak, VSCO, Cockroach, + many more
The future of gRPC
The future of gRPC is easy to track: look at the grpc/grpc-proposals repository and grpc-io mailing list.
- New languages (swift + haskell are currently experimental)
- Further stability, reliability, performance improvements
- Increasingly fine-grained APIs for customizing behavior (connection management, channel tracing)
- Browser JS!