In this mission, we’re diving deep into one of the most subtle threats to data reliability: schema drift. As your systems evolve — fields are added, modified, or removed — your once-perfect data contracts may silently go out of sync. Left unchecked, this can wreak havoc across producers and consumers alike. Your objective is to detect and analyze schema changes before they cause trouble.

This tutorial is the third of a series of four, explaining how you can use and get value from Data Contracts & Data Products in a very hands-on way. Over ninety people participated in the early alpha and beta in May and June 2025, with testimonies such as the one from Victor Tulus: Automation of schema drift checks is an extremely useful service that would reduce the cognitive load on the owners and facilitate adoption and long-term maintenance of data contracts!

Check out my list of data contracts & data products tutorials on Medium; you’ll find the link to the other three tutorials and the surveys. The associated GitHub repo contains extra information. When you finish the four tutorials and four surveys, you will receive a digital badge to show your recently acquired knowledge.

You’ll start by expanding your existing tables to include change data capture (CDC) fields — something many teams forget until it’s too late. Then, you’ll compare these changes against your base contract using temporary data contracts and Bitol’s built-in drift analysis. By the end, you’ll know exactly when to bump a version, when to pause, and when you’ve crossed into major-change territory.

Bitol’s ODCS is a perfect tool for monitoring schema drift

Setting up the playground

For simplicity, as I did previously, I will export all key values as environment variables, so it will be easier to copy/paste the curl calls. Remember, your values are different…

export BITOL_URL=https://cloud.jgp.ai/api
export BITOL_USER_PW=BitolRu7ez!
export BITOL_USER_EMAIL=jgp@jgp.net
export BITOL_API_KEY=97d09209-9021-4b24-89a9-620aae063d40
export BITOL_CONTRACT_ID=34cae6d7-7648-38b2-8f66-8db79e1e2ce4

Reminder: the repository and resources are available at https://github.com/jgpdotai/cloud-services/.

Let’s Modify our Tables

Remember the customer and address tables from the data contract tutorial? Well, I realized later that I needed to have some basic change data capture (CDC) fields in place. Let’s add a created_at and updated_at in our tables. While I was at it, I also added two extra fields for the street address.

The ERD diagram is now a tad bigger with those extra fields:

The DDL is also impacted.

-- Table: Customer
CREATE TABLE Customer (
customer_id SERIAL PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
phone VARCHAR(20),

-- CDC fields
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL
);

-- Table: AddressType
CREATE TABLE AddressType (
address_type_id SERIAL PRIMARY KEY,
address_type VARCHAR(20) NOT NULL,

-- CDC fields
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL
);

-- Table: Address
CREATE TABLE Address (
address_id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
street1 VARCHAR(255) NOT NULL,
street2 VARCHAR(255),
street3 VARCHAR(255),
street4 VARCHAR(255),
city VARCHAR(100) NOT NULL,
state VARCHAR(100),
postal_code VARCHAR(20),
country_cd CHAR(2) NOT NULL,
address_type_id INT,

-- CDC fields
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL,

CONSTRAINT fk_customer FOREIGN KEY (customer_id) REFERENCES Customer(customer_id) ON DELETE CASCADE,
CONSTRAINT fk_address_type FOREIGN KEY (address_type_id) REFERENCES AddressType(address_type_id)
);

Wait a second… This is a schema change, but is it dangerous? In this situation, the risk is moderate as you add fields.

It is a common misconception that adding fields to a table does not involve risk. It is not a guarantee. Imagine I do a “select * from address” and take the 6th element. In the previous version of the customer table, I would have gotten the state; in this version, I may get street4.

Analyzing the Drift with Data Contracts

Let’s generate a temporary data contract from DDL, but we’ll keep it light and disposable this time.

Process to control the drift

As we did in the data contract tutorial, let’s generate a data contract for this DDL script. However, I am not interested in permanently storing the contract this time, as I already have a base contract for my customer data.

The command line should start to get familiar by now:

cat resources/customer_cdc.sql|
curl -X POST "$BITOL_URL/v1/contracts?sourceFormat=DDL&
version=0.1.0&name=CustomerTemporary&domain=Customer&
tenant=ClimateQuantum&storage=temporary"
-H "X-API-KEY: $BITOL_API_KEY"
-H "X-USER-PASSWORD: $BITOL_USER_PW"
-F "file=@-"

Look at the last argument on the URL: storage=temporary. I am not interested in storing it, so I just ask the system to store it temporarily. It will be cleaned later but will be available for at least 24hr. There is also no check on whether it’s being overwritten. And you should get something like:

{
"domain": "Customer",
"name": "CustomerTemporary",
"id": "ff1a171a-6d52-3316-8608-8cf1cce07ac9",
"version": "v0.1.0",
"tenant": "ClimateQuantum",
"status": "draft"
}

I can export the contract’s id as:

export BITOL_CONTRACT_ID2=ff1a171a-6d52-3316-8608-8cf1cce07ac9

Let’s ask the service to analyze the drift.

curl -X GET "$BITOL_URL/v1/contracts/compare?
id1=$BITOL_CONTRACT_ID&
id2=$BITOL_CONTRACT_ID2&
storage2=temporary&
scope=schema"
-H "X-API-KEY: $BITOL_API_KEY"
-H "X-USER-PASSWORD: $BITOL_USER_PW" | jq

The return is a rather long JSON file that looks like:

{
"diffs": [
{
"type": "ADD",
"section": "schema",
"field": "properties",
"path": "/schema[name=Address]/properties[name=street3]",
"node1": null,
"node2": {
"logicalType": "string",
"name": "street3",
"physicalName": "street3",
"physicalType": "VARCHAR (255)"
},
"message": "Added at /schema[name=Address]/properties[name=street3]: {"logicalType":"string","name":"street3","physicalName":"street3","physicalType":"VARCHAR (255)"}",
"impactLevel": "MINOR"
},
...
],
"major": 0,
"minor": 8,
"patch": 0,
"id1": "34cae6d7-7648-38b2-8f66-8db79e1e2ce4",
"storage1": "permanent",
"version1": "v0.1.1",
"id2": "ff1a171a-6d52-3316-8608-8cf1cce07ac9",
"storage2": "temporary",
"version2": "v0.1.0",
"suggestedVersion": "v0.2.0"
}

The crunchy (read: interesting) is at the end. The service detected:

  • Zero major changes (that’s good!),
  • Eight minor changes, matching the eight fields I added,
  • Zero patches.

Based on those changes, the service suggests that the original contract (…2ce4) should now be version 0.2.0.

You can have a look at the process adapted to this case:

In this situation, the process can decide to deploy automatically the new schema.

I am pretty pleased with the result, so I can push this new contract as 0.2.0:

curl -X GET "$BITOL_URL/v1/contracts/$BITOL_CONTRACT_ID2?storage=temporary" 
-H "X-API-KEY: $BITOL_API_KEY"
-H "X-USER-PASSWORD: $BITOL_USER_PW"
--output $BITOL_CONTRACT_ID-0.2.0.odcs.yaml

curl -X POST "$BITOL_URL/v1/contracts?contractId=$BITOL_CONTRACT_ID&
version=0.2.0"
-H "X-API-KEY: $BITOL_API_KEY"
-H "X-USER-PASSWORD: $BITOL_USER_PW"
-F "file=@$BITOL_CONTRACT_ID-0.2.0.odcs.yaml"

On this call, I force the contractId as I do not want the contractId in the contract to be used. I want …2ce4, not …7ac9 (the temporary one).

Before pushing the new contract, you may need to copy manually some of the additional information from the previous version, like data quality, SLA, documentation, etc. You can use the compare service to make sure you saved everything. You can change the scope from schema to all in the URL.

Critical Changes

Oh, I hear you in the back of the classroom: “adding fields is not the problem, what happens if I remove some, like street4?

Here is the update DDL script, the only change is that street4 has been removed:

-- Table: Customer
CREATE TABLE Customer (
customer_id SERIAL PRIMARY KEY,
first_name VARCHAR(255),
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
phone VARCHAR(20),

-- CDC fields
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL
);

-- Table: AddressType
CREATE TABLE AddressType (
address_type_id SERIAL PRIMARY KEY,
address_type VARCHAR(20) NOT NULL,

-- CDC fields
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL
);

-- Table: Address
CREATE TABLE Address (
address_id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
street1 VARCHAR(255) NOT NULL,
street2 VARCHAR(255),
street3 VARCHAR(255),
city VARCHAR(100) NOT NULL,
state VARCHAR(100),
postal_code VARCHAR(20),
country_cd CHAR(2) NOT NULL,
address_type_id INT,

-- CDC fields
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NULL,

CONSTRAINT fk_customer FOREIGN KEY (customer_id) REFERENCES Customer(customer_id) ON DELETE CASCADE,
CONSTRAINT fk_address_type FOREIGN KEY (address_type_id) REFERENCES AddressType(address_type_id)
);

Let’s see what happens…

cat resources/customer_cdc_wo_street4.sql |
curl -X POST "$BITOL_URL/v1/contracts?sourceFormat=DDL&
version=0.3.0&name=CustomerTemporary&domain=Customer&
tenant=ClimateQuantum&storage=temporary"
-H "X-API-KEY: $BITOL_API_KEY"
-H "X-USER-PASSWORD: $BITOL_USER_PW"
-F "file=@-"

As expected, you get a new data contract, from which I will use the id…

{
"domain": "Customer",
"name": "CustomerTemporary",
"id": "50c0a45b-f2e0-39c0-b6a8-97e78e2cfedd",
"version": "v0.3.0",
"tenant": "ClimateQuantum",
"status": "draft"
}

in an environment variable:

export BITOL_CONTRACT_ID2=50c0a45b-f2e0-39c0-b6a8-97e78e2cfedd

And now, you can compare the two contracts.

curl -X GET "$BITOL_URL/v1/contracts/compare?
id1=$BITOL_CONTRACT_ID&
id2=$BITOL_CONTRACT_ID2&
storage2=temporary&
scope=schema"
-H "X-API-KEY: $BITOL_API_KEY"
-H "X-USER-PASSWORD: $BITOL_USER_PW" | jq

And you will get a single major change:

{
"diffs": [
{
"type": "REMOVE",
"section": "schema",
"field": "properties",
"path": "/schema[name=Address]/properties[name=street4]",
"node1": {
"logicalType": "string",
"name": "street4",
"physicalName": "street4",
"physicalType": "VARCHAR (255)"
},
"node2": null,
"message": "Removed from /schema[name=Address]/properties[name=street4]: {"logicalType":"string","name":"street4","physicalName":"street4","physicalType":"VARCHAR (255)"}",
"impactLevel": "MAJOR"
}
],
"major": 1,
"minor": 0,
"patch": 0,
"id1": "34cae6d7-7648-38b2-8f66-8db79e1e2ce4",
"storage1": "permanent",
"version1": "v0.2.0",
"id2": "50c0a45b-f2e0-39c0-b6a8-97e78e2cfedd",
"storage2": "temporary",
"version2": "v0.3.0",
"suggestedVersion": "v1.0.0"
}

street4 has been removed from the Address table, creating a major change. Based on this major change, the suggested version, if you want to keep this version, is 1.0.0.

The process for a major change can be seen as:

A major change will require more work.

The Logic Behind the Analysis

Here is the logic on how the changes are measured. It is based on three criteria:

  • The section where it happens in the contract. Check the sections on the Bitol ODCS standard.
  • The action on whether it is an addition, a change, or a removal of an element.
  • The contract field itself.

Examples:

  • In the schema, if the logicalType changes, it is considered a major change.
  • if the contractCreatedTs is changed, it is considered a patch level of impact.

The following table summarizes and provides the authority for those changes.

Impact rules

An increase in field size (a VARCHAR(32) to VARCHAR(64), for example) is seen as a major change for now, when it should be a minor change. Let’s keep room for improvement.

Mission Debrief: Schema Drift Controlled

Well done, Agent. In this mission, you went beyond writing data contracts — you used them as powerful tools to detect and control schema drift. By comparing your base contract to updated DDL scripts, you identified both minor evolutions (like added fields) and major breaking changes (like removed ones). More importantly, you learned how the Bitol services on jgp.ai help you quantify the impact and suggest the appropriate version bump, all without guessing.

You also explored the value of temporary contracts — a lightweight, disposable way to experiment safely without cluttering your permanent registry.

You now know how to:

  • Generate temporary data contracts from DDL.
  • Detect and analyze drift across versions.
  • Interpret impact levels (minor, major, patch).
  • Apply semantic versioning with confidence.

Remember that managing schema drift is not a one-time task — it’s an ongoing discipline. But with tools like ODCS and the Bitol services, you’re not just reacting to change — you’re leading it.

Thanks to Pierre-Yves BONNEFOY for suggesting the process diagram and contributing the first version!

Updates:

  • 2025–07–31 Tutotial published.


Controlling Schema Drift was originally published in Data Mesh Learning on Medium, where people are continuing the conversation by highlighting and responding to this story.

Leave a Reply

Your email address will not be published. Required fields are marked *