Give your backend a design system

Contents #

The core need #

To be clear, a rigorous API spec that generates all sorts of things is not a new concept (add footnote to prior art here). But its importance, position, and implications are rapidly changing in an AI-native world.

The core need is thus:

You have an existing backend system with SQL tables, data models, and usually API "services" — e.g. users, comments — and maybe some generated Swagger docs
You start to integrate "AI" into your system in various ways
And realize, hey, we need these table, field, and endpoint descriptions all over the place
Your coding, BI, security review, UI/client code generators, devops/sql helper agents all need this information

Normally you might just copy this information all over the place, or erect a team to manage it…

But in a super-fast-changing and increasingly guardrailed environment, that just isn’t going to work.

Getting carried away… #

So you send your trusty AI coding agent to generate a bunch of schemas for you based on your models, SQL tables, etc.

Buuuttt if you give a mouse a cookie…

He’s probably going to want to generate integration tests, admin / frontend code, client libraries, documentation, AI-evals, and of course make such a schema accessible for all AI agents across your company to access (securely, of course!) for all such cases.

And before long you realize that the schemas are the code — in a very real sense — and should probably be the source of truth.¹

This pattern isn’t just for data models: cross-platform test cases, design / branding systems, documentation / strings — hell, even accounting, legal, sales, HR, and others are rapidly converging on concise, canonical, and most importantly, AI-first specs or schemas. This is what tomorrow’s AI-native business will look like.

But we’ll stay focused on data + method schemas for today:

Myriad benefits #

As you go further down this rabbit hole, you realize just how many parts of your system can benefit from this.

You probably want to support MCP, but what about HTTP, SSE, WebSocket, command line, etc?
This gives you a single, internal API interface (if done right) and allows any protocol / transport to be plugged in.
Much of your routing, validation, error checking and handling, test generation, RBAC, etc can be generated (mostly) from these schema files.
Having one surface for authentication checks, rigidly defined and enforced, can make your system a lot more secure.²
You end up having way less code than a normal backend system with this approach.¹
And with much of the code generalized, the investment in insane levels of testing / verification (add link here to below section) and hardening makes total sense — each new test added to shared routing or validation improves all services and models at once.
Once you have a few schemas well-defined and working well, it’s incredibly easy to make more.
LLMs are fantastic with this sort of thing:
Simply dictate your new data model / service idea, make sure to reference your canonical examples, and poof.
You now also have a robust prompt / tool-description library that’s safely version-controlled and easily accessible.

So how do we get started?

But first, an example #

Emulating the great Rob Pike³, let’s show a detailed example in place of a lot more words.

# Similar in spirit to JSON Schema, but our own simple format that allows comments and
# addresses our needs (and of course is fully tested!).
title: Files
id: files

# AI/prompt-first description
description: |
  Content-addressed file storage. A file row IS the file: id is the SHA1 hex
  hash of the bytes, bytes live at ...

# For API/MCP routing (can be omitted; defaults to /files from id above)
service_path: /files
# Set to `-` for non-db-backed models
pg_table_name: file
# Defaults to `user`. Properties and methods can also specify role.
role: user

# Always in alpha order
model_properties:
  - id: checksum_verified_at
    type: date_time
    description: When file integrity was last verified (null if never)
  - id: content_draft
    type: text_long
    description: In-progress bytes for streaming files (voice recording). NULL when not streaming.
  - id: content_type
    type: text_mime
    description: Detected MIME type (e.g. image/png, application/pdf)
  - id: created
    type: date_time
    description: UTC timestamp when uploaded
  - id: id
    type: guid
    description: Standard GUID — primary key
  - id: metadata
    type: object_any
    description: Extracted metadata (EXIF, PDF info, provider response, ...)
  - id: original_name
    type: text_short
    description: Original filename as uploaded
  - id: parent_id
    type: guid
    description: Generic parent link (NOT for versioning)
  - id: private
    type: bool
    description: |
      Owner-only file. When true (paired with list_config.privacy_aware),
      even admin list / drill / user_id=all never surfaces this row —
      only the owning user_id sees it. Default false.
    default: false
  - id: size_bytes
    type: int_nonneg
    description: File size in bytes
  # ... more props

# Behavior declared in schema (not just shape) — auto-CRUD honors these
list_config:
  default_limit: 50
  max_limit: 200
  sortable_fields: [created, updated, size_bytes]
  default_sort: created desc
  user_scoped: true         # rows scoped to caller by default
  admin_default_scope: all  # admin list spans all users unless drilled
  privacy_aware: true       # honors per-row `private` flag even for admin

methods:
  # Auto-CRUD --------------------------------------------------------------
  - id: get
    description: Fetch a single file row by id (SHA1 hex)

  - id: list
    description: List the caller's files, newest first
    properties_optional: [group_id, status, source]

  - id: patch
    description: Update mutable metadata. Content is immutable.
    properties_optional:
      [date_effective, description, geo_alt, geo_lat, geo_lng, geo_location,
       metadata, parent_id, status, tags, tags_project, title, transcript]

  - id: delete
    description: Soft-delete (sets status=deleted). Bytes remain on disk.

  # Custom -----------------------------------------------------------------
  - id: classify
    description: |
      Retry Haiku classification for an existing file. Admin / debug —
      classification runs automatically on user uploads.
    http_method: POST
    path: /files/classify
    role: admin
    params:
      - id: id
        type: sha1_hex
        required: true
        description: File id (SHA1 hex)
    errors: [not_found, internal_error]

Notice what’s there besides shape: routing, RBAC, list defaults, privacy semantics, the difference between auto-CRUD and custom methods. That’s the schema doing real behavioral work, not just describing fields.

Hardcore verification #

One of my first thoughts upon becoming obsessed with GPT-3 was that, if nothing else, the extra power from LLM-coding should allow us to build the engineering org/team/stack we’ve always dreamed of but never had the time or budget to build beforehand.

Hardcore verification across all layers is a major part of this, and Claude excels at this kind of work.

In our case, the layout is simple:

A single directory of services, each in its own folder (e.g. users/).
Each service folder has one or more YAML schema / data-model files, plus business logic and supporting code.
Then a really robust, top-level test runner that will:
- Create a test database.
- Walk all services, and for each:
  - Confirm rigid service format.
  - Confirm rigid data-model / schema format.
  - Exercise all endpoints with positive and negative tests.
  - Confirm what’s in the DB using SQL (we have the spec, after all).
  - Allow for — and require — tests for non-standard (non-CRUD-like) cases.
And of course unit-test coverage for all tools and supporting code.

Our schemas are used to generate client libraries and tests, and used in various other capacities like generating admin tools with their own suite of tests.

The compounding effect is the real prize: each new test added to shared routing or validation strengthens every service in the system. You’re not testing one feature; you’re testing the platform.

Custom types: a real powerhouse #

Traditional validation params are nice, and work well (enum, min, max, etc), but they can be verbose — several fields — and not always specific enough.

Very specific, custom types make validation and AI tooling extremely tight.
They can also carry business information, e.g. chorus_user_id vs stripe_user_id.
They’re super easy for AI to reason about and manage, and there’s usually no harm in adding a lot (within reason). Even oddly specific ones earn their keep — like chorus_user_id_or_all for the admin tool that legitimately needs to fetch "all" users.
It’s very easy to test ALL boundaries of each param.
Each well-defined custom type provides positive and negative examples, and contains a robust description for AI agents to use.
They also further enhance your ability to generate dynamic UI — custom-type metadata is communicated via app metadata to client consumers (think dropdowns, select-many, etc).

A quick note on admin tools #

With this approach, it’s incredibly easy to generate entire admin/internal tools — including writes. But you might want to consider if you really need that tool.

Should humans be using such internal tools of old?
Or should agents be using your MCP, and your humans instead be reviewing reports?

Only you can say. :)

Appendix #

1. Prior art: schemas as source of truth is not new #

Google — Protocol Buffers + google.api: .proto files are the canonical IDL across the company. Stubby (internal RPC), gRPC, REST gateways via google.api.http annotations, all client SDKs, and docs are generated from protos. Per Buf: "since all services and clients derive from Protobuf definitions, these schemas become the canonical source of truth for the shape and behavior of APIs."⁴
Amazon — Smithy: open-sourced 2019, v2 in 2022. Evolved from "Coral," AWS’s internal IDL since the mid-2000s. Every AWS SDK is generated from Smithy models. Explicitly protocol-agnostic — the same model emits REST JSON, REST XML, RPC, etc.⁵
Microsoft / Azure — TypeSpec (formerly Cadl): a higher-level schema language that compiles to OpenAPI, Protobuf, JSON Schema, and (experimentally) clients and servers. Designed to bridge "design-first" and "code-first" camps.⁶
Stripe: famously has an internal API description format from which docs, SDKs, and the API reference site are all generated.
GraphQL SDL: schema-first by construction. The schema literally is the contract clients see.
FHIR (healthcare): resource definitions drive validators, code generation, and reference implementations across an industry.

2. What stays as code #

A skeptical reader is rightly asking this from paragraph one. Schemas describe the surface; code still implements the behavior. The split, concretely:

Business logic and side effects. LLM calls, image processing, file storage, payment flows, third-party integrations — none of this can live in YAML, nor should it.
Cross-field invariants, whole model validation, and custom validators.
- Type-level validation gets you "is this a valid sha1?"
- Code gets you "if status=streaming then content_draft must be non-null and size_bytes must be zero."
- The schema declares fields; code enforces the relationships between them.
- For write endpoints, we add a standard ValidateWhole() function to allow for validation beyond single params
Non-CRUD endpoints with awkward shapes. Streaming voice transcription, multipart uploads, long-running jobs, server-sent events. Declare them in the schema for routing and discoverability; implement them by hand.
Background jobs, schedulers, and anything stateful over time. Cron-style runners, queue workers, retries with backoff — orthogonal to the request / response surface the schema describes.
Hand-tuned hot paths. The 1–2% of endpoints where generated boilerplate isn’t fast enough. Schema still defines the contract; the handler is bespoke.
The boring glue. Database migrations, config loading, observability wiring. These touch every service but aren’t of any service.

The schema-as-source-of-truth claim is not "no code." It’s "the boilerplate that used to be 80% of a service is now generated, and the 20% that’s actually interesting is what you write."

Use caution when using legacy code generators. In our experience, these are more hassle than they’re worth; LLMs do this with ease given one or two good examples (have the LLM generate everything based on the schema).

3. Other fields and controls #

We don’t explicitly return schema, but our API / MCP / routing layer handles consistent JSON responses, and we assume each method returns a list of models or a model.
Schema versioning and evolution aren’t explicitly shown here, but are easily recreated via schema-update SQL files.

4. References / further reading #

Additional reading:

Buf — "Why a Protobuf schema registry?". Treat schemas as first-class distributable artifacts; get breaking-change detection and consumer-impact analysis across services.
Charlie Holland — "The Schema Language Question." Compares Avro, JSON Schema, and Protobuf along evolution, tooling, and validation. No universal winner — choice depends on whether you’re optimizing for events, HTTP, or RPC.
Total Shift Left — "What Is API Contract Testing?" Contract testing pins the API in a machine-readable spec; consumer-driven (Pact) vs schema-first / provider-owned (OpenAPI, Smithy).

Footnotes #

See appendix §2 — "What stays as code." TL;DR: schemas describe the surface; code still implements behavior, side effects, and anything stateful. ↩
Side benefit worth its own post: with auth declared in schema, qualitative AI security review becomes tractable — an agent can audit every method’s role + scoping rules in one pass. ↩
Rob Pike’s "hello, world" / Brian Kernighan lineage — for the "small example beats a thousand words" principle. ↩
Buf — "The real reason to use Protobuf is not performance." buf.build/blog/the-real-reason-to-use-protobuf. The real value isn’t speed; it’s an enforceable, versioned, language-agnostic contract between producers and consumers. ↩
AWS Smithy. smithy.io. Protocol-agnostic IDL, v2.0 (2022); descended from internal "Coral." Every AWS SDK generated from it. Core concept is @trait metadata. ↩
Microsoft TypeSpec (formerly Cadl). typespec.io. Concise TypeScript-flavored DSL that compiles down to OpenAPI, Protobuf, JSON Schema, and experimental clients / servers. ↩