A data contract is an agreement between a data producer and its consumers
A data contract is an agreement between a data producer and its consumers

On this day, when most countries celebrate work by not working, the PayPal teams, who worked tirelessly on Data Mesh, released under an Apache 2 license its template for a data contract. Data contracts are part of the future of data engineering and are essential to data products.

A data contract is a binding agreement between the consumers and producers of data. You can see it as a data schema on steroids or data schema++. The goal of the contract is to set expectations between the parties. It can be built as fit-for-purpose where the consumers and producer agree on what it should contain or can serve as a brochure for any consumer willing to access the data offered by this (data) product.

In its current version (v2.1.1), PayPal’s data contract focuses on eight sections: demographics, dataset & schema, data quality, pricing, stakeholders, roles, service-level agreement, and other properties. As you can read, it does not limit itself to a mere schema.

A data contract can contain a single dataset with multiple tables, with many possible details at both the table and column levels.

Data engineers can define data quality (DQ) rules at multiple levels. Defined rules are tool agnostic. You can leverage your own home-grown DQ tool or tools like Great Expectations.

In this version, pricing is experimental.

Stakeholders play a helpful role in creating trust in a specific data product. The contract describes the stakeholders as well as their history on the project. This does not prevent tribal knowledge but keeps track of its evolution.

Prior versions of the data contract template are not available on GitHub.

The data contract template is a YAML file that can be easily read by humans & software alike.

Check it out on AIDA User Group’s open-source GitHub at: https://github.com/AIDAUserGroup/data-contract-template (or https://jgp.ai/dct).

By releasing the template in open source, PayPal shows its leadership in state-of-the-art data engineering. Like any open-source project, the PayPal data contract team welcomes contributors, implementors, and other feedback. Feel free to like it, fork it, and share it!


Feature and illustration photo by Alexander Suhorucov.


Update:

  • 2023-07-06: Highlight AIDA User Group.

2 thoughts on “PayPal open sources its data contract template

Leave a Reply

Your email address will not be published. Required fields are marked *