Managing Massive Schemas with Codegen

By Adam Kramer on October 26, 2017


Adam Kramer has been a software engineer at Facebook since 2008, working on the largest GraphQL server in the world for the last a year and a half. In addition to an engineering career, he holds a Ph.D. in Psychology and is interested in developing highly usable engineering systems to make software engineering more accessible to human beings. His three top hobbies include refereeing roller derby, karaoke, and explaining jokes.

Abstract

At Facebook, our GraphQL schema has over 10,000 object types. How do we manage such a massive, rapidly-changing schema with a team of just 5 engineers? The secret is writing code to write code, providing cleaner schemas, safer typing, and faster execution times.

Defining a schema

Adam is arguing that there doesn't need to be a big debate or discussion over what the schema looks like.

Adam starts by asking: "What's your server language?"

Facebook uses Hack, and the principles of the server language should translate to the schema language.

Most languages have "classes" and "methods". Analogously GraphQL has "types" and "fields".

Adam suggests that these concepts are similar, and we should consider that.

Below is an example GraphQL schema and abbreviated Hack class describing the same type.

Screen Shot 2017-10-26 at 2.30.02 PM

Notice some similarities between the GraphQL schema User and the Hack class FBUser.

Schemas are a type system

GraphQL is strongly typed. Types have fields, fields have types.

Your language might be strongly typed too.

Your schema definition is the bridge between GraphQL and your language.

Your server's implementation matters. Do not abstract away from this blindly.

Data layers vs. implementation layers

Even though client engineers operate above the data layer, as API engineers we should care how the server is executing requests.

At Facebook, server APIs usually expose the right data. Adam uses this to tie the Hack type system and the GraphQL type systems together as cleanly as possible.

Don't write your own schema

Below is a new schema definition language previously presented at a different (???) conference.

Screen Shot 2017-10-26 at 2.35.31 PM

Adam suggests that the internal server API should inform the external server API.

He suggests you indicate in your server language what the schema should look like.

Codegen to save us from ourselves

But happens if the implementation changes?

Assume a tight coupling, but don't tie the two together... then abstract away.

Runtime artifacts map between systems.

This makes the runtime safe, efficient, and disconnected.

Screen Shot 2017-10-26 at 2.39.02 PM

Code on the bottom is generated code, from the code on the top.

Screen Shot 2017-10-26 at 2.39.10 PM

Compile time vs. build time vs. run time

This lets you ensure at compile time, your native code is sound. And also at build time, your generated code is sound.

The hard work has been done by code generation.

As a result, your schema and queries are sound.

... so your type systems match at runtime!

Summary

Suggesting you just write good APIs for your server, annotate the server code with hints on how to generate the schema, and then generate the schema from there.