Since their inception, we have written data contracts for two readers: humans and the validators that humans built. The schema described the shape, the quality rules described the promise, and the SLAs described the patience. Useful, but quietly assuming that whoever consumed the data already knew what it meant.

That assumption no longer holds. The consumer is now, increasingly, an LLM or an agent that has never met your data dictionary and never will. So the next versions of both Bitol standards, ODCS v3.2 (in progress) and ODPS v1.1 (draft), share a single theme: AI is no longer a use case bolted on at the end. It is the baseline.

ODCS 3.2 is dedicated to the memory of Peter Flook, a longtime contributor whose work shaped our data quality testing, the negative-test suite, schema validation, and more vendor onboarding than most of us will ever do. This release carries his contributions forward. Merci, Peter.

The joint release of ODCS v3.2 and ODPS v1.1 focuses on what AI needs.

The headline: a context block for both standards

The centerpiece of this cycle is RFC-0038, a new optional context block that lands in ODCS and ODPS at the same time. Same shape, same intent, both layers of the stack. It exists for one reason: to give AI agents, LLMs, BI tools, and semantic layers the interpretive guidance that schema and quality rules never captured.

Three sub-fields, all optional:

  • instructions: how this thing should be used, in plain language. “Use for revenue analysis and order trends. Do not use for individual customer PII queries.”

  • verifiedStatements: canonical business questions, each with an optional curated answer. Entries with an answer should be returned verbatim when a query is semantically close; entries without a question can be used to prime text-to-SQL and help with disambiguation.

  • constraints: the negatives. “Always aggregate to at least country level.” “Do not join with PII tables without approval.”

If you have ever watched an LLM cheerfully hallucinate a join, that last field is the one you have been waiting for. Positive descriptions tell a model what a column is. They do not stop it from doing something stupid with it. Negative guidance does.

And before anyone accuses me of selling vibes, the research is nicely detailed:

  • Semantic catalog enrichment improved SQL accuracy by 27% (Tiger Data, 2026).

  • Column type annotations alone bought 8%; semantic descriptions, 12%; hybrid metadata, 20%-25% (Mishra, 2025).

  • Negative guidance and verified answers, the heart of Microsoft Fabric’s “Prep for AI,” prevent whole classes of hallucinated joins that positive descriptions never catch (Microsoft, 2025).

By standardizing, Bitol avoids having to write it in a dozen incompatible proprietary formats.

The pipeline behind it: more AI-native building blocks

The context block is the part that is merged. Behind it sits a cluster of proposed RFCs, all targeting 3.2, that pull the same thread: make the contract legible to a machine that has to reason about meaning, not just structure. None of these are sealed yet, and a couple still have open decisions, but the intent is unmistakable.

Measures and dimensions (RFC-0034). Today, a contract describes columns. It does not describe the metric a business actually argues about, the “Total Revenue” or “Average Basket Value” that every dashboard recomputes slightly differently. This RFC lets a property declare itself a measure or a dimension, so a KPI is defined once and read the same way by BI tools, catalogs, and an AI assistant trying to answer a question in business terms.

Synonyms (RFC-0041). Nobody asks an LLM for the chiffre_d_affaires_eur column. They ask about turnover, or sales, or TO, or, in my native language, the chiffre d’affaires. A synonyms field attaches alternative names to any object, so a natural-language tool can map the human word to the right field, across teams and locales. The exact shape is a rich object, which the TSC just ratified.

Vector type (RFC-0042). This is the one that makes a contract truly AI-native. A new vector logical type, with the dimensionality, element type, distance metric, normalization flag, and the embedding model that produced the values. Right now, an embedding column gets smuggled in as a generic array, losing everything a retriever needs to know. Standardize it, and a RAG pipeline can discover, from the contract alone, how to query the vectors and which model to embed the question with. No reverse-engineering of someone’s conventions.

Read together, the through-line is the same as the context block: the contract stops being a description a human reads and becomes an instruction set a machine can act on.

ODPS v1.1: products that announce what they are

On the product side, two changes are worth your attention beyond the shared context block.

  • A top-level type field (RFC-0029): sourceAligned, aggregate, consumerAligned, or your own taxonomy. It sounds modest. It is the difference between a catalog you can filter and one you scroll through.

  • Friendlier ports: every array object now carries an optional stable id, and input and output ports require only a name. Versions and contract IDs become optional, which makes early, iterative drafts far less ceremonious.

What this means if you are shipping data

  • Practitioners: Start adding context.instructions and context.constraints to your most-queried contracts first. That is where text-to-SQL accuracy is bleeding today, and where you will feel the lift fastest.

  • Leaders: This is the cheap insurance you keep asking for. AI projects fail due to trust and ambiguity far more than due to model quality. A context block is governance that an agent can actually read.

A word about how the sausage is made

None of this fell from the sky. It came out of the Bitol Technical Steering Committee, that small, stubborn, cross-company crowd who argue in public so you do not have to. I will let you in on a secret about running an open standard by consensus: pure democracy is slow, but it is remarkably efficient. The entire TSC once burned real cycles deciding what to name a single field, the whole room cycling through request, statement, question, example, and prompt before landing on verifiedStatements. Glacial? Occasionally. But nothing ships until everyone in the room can defend it, and that is exactly why you can build on it. To the TSC: thank you, sincerely, and please do not read this paragraph at the next meeting.

Neither version is final, and the two labels mean slightly different things. ODPS 1.1 is a draft: the shape is still being formed, and fields can still come and go. ODCS 3.2 is in progress: the big decisions have been made, and approved RFCs are landing, but the release is not yet finalized. In both cases, the work continues in the open. The direction, though, is set, and it is worth getting ahead of: your contracts and products are about to acquire a second audience that does not read documentation, does not attend the kickoff, and does not forgive ambiguity. Best to write things down where it can find them.

Early JSON Schemas can be found in the ODCS dev branch and the ODPS dev branch. Don’t forget to star ODCS and ODPS!


Sources