It’s a scenario that’s painfully familiar to every data professional: a critical dashboard breaks, a stakeholder questions a report, or a late-night PagerDuty alert screams about data quality failure. These issues almost always trace back to a simple, unstated assumption—a broken promise between a data producer and a data consumer. The solution gaining traction is the data contract, a formal agreement defining the shape, semantics, and service-level objectives of a dataset. While many teams turn to their transformation layer, typically dbt, to implement these checks, we’ve found a more robust and reliable source of truth. For us, our data contracts live in Postgres, providing a foundational layer of enforcement that dbt alone cannot.
This isn't about abandoning dbt. It's about putting the enforcement where it has the most leverage: directly within the database.
Why dbt Isn't the Final Frontier for Data Contracts
dbt has revolutionized the analytics engineering workflow. Its testing framework is a fantastic first step toward data quality and is often the entry point for teams thinking about data contracts. You can easily define tests for uniqueness, non-null values, and accepted values.
However, relying solely on dbt for contract enforcement has inherent limitations:
Reactive, Not Proactive: A dbt test typically runs after data has been loaded and transformed. It’s an excellent detective, flagging bad data after the fact. But a true data contract should be a security guard, preventing bad data from ever entering the warehouse in the first place. A failed dbt test means the pipeline is already broken and requires manual intervention.
Coupled to the Transformation Layer: Data contracts defined in dbt YAML files are intrinsically linked to the dbt project. This works perfectly for data created by dbt models. But what about data loaded by other tools like Fivetran, Airbyte, or custom Python scripts? Those pipelines completely bypass dbt’s testing framework, leaving a massive gap in your data governance.
Created by Andika's AI Assistant
Full-stack developer passionate about building great user experiences. Writing about web development, React, and everything in between.
Limited Enforcement: A failed dbt test raises an alert. It creates visibility, which is crucial, but it doesn't inherently stop the data from being propagated downstream. A BI tool or a reverse ETL job can still pick up and serve the faulty data before the on-call engineer has even had their morning coffee.
The Case for Implementing Data Contracts in Postgres
By moving the core of our contract enforcement to the database level, we shift from a reactive to a proactive stance. Implementing data contracts with PostgreSQL leverages the database's native capabilities to act as the ultimate arbiter of data quality, creating a more resilient and tool-agnostic system.
Leveraging Native Database Constraints for Proactive Enforcement
PostgreSQL is more than just a data store; it's a powerful data validation engine. We can map the clauses of a data contract directly to native database constraints:
Schema and Data Types: The table's CREATE TABLE statement is the most basic contract, defining column names and types (VARCHAR, INT, TIMESTAMPTZ).
Non-Null Guarantees: The NOT NULL constraint ensures a column must contain a value.
Uniqueness: A UNIQUE or PRIMARY KEY constraint guarantees no duplicate values in a column or set of columns.
Semantic Rules: The CHECK constraint is the real powerhouse. It allows you to enforce arbitrary business logic, such as ensuring a status column only contains specific values or that an email column matches a regex pattern.
Consider this simple example for a users table:
CREATETABLEpublic.users ( user_id UUID PRIMARYKEY, email VARCHAR(255)NOTNULLUNIQUE,statusVARCHAR(20)NOTNULL, created_at TIMESTAMPTZ DEFAULTnow(),-- This is the data contract enforcementCONSTRAINT email_must_be_valid CHECK(email ~*'^[A-Za-z0-9._+%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$'),CONSTRAINT status_must_be_valid CHECK(statusIN('active','pending','suspended')));
With these database-level data contracts, any INSERT or UPDATE statement that violates these rules will be rejected by Postgres with an error. The bad data is stopped at the door, period.
Centralization and Tool Agnosticism
When your data contract is a CHECK constraint on a Postgres table, it doesn't matter who or what is writing the data. Whether it's a dbt model, a Fivetran connector, or an intern's Jupyter notebook, the rules are enforced universally.
This approach provides a single source of truth for your data quality rules. You no longer need to duplicate validation logic in different systems or worry about gaps in your coverage. This decoupling of validation from transformation logic dramatically simplifies your data governance strategy and increases trust in the data.
A Practical Workflow: Combining Postgres, dbt, and CI/CD
This isn't an "either/or" choice. The most powerful setup combines the strengths of Postgres, dbt, and modern DevOps practices.
Define Contracts as Code: Store your table schemas and constraints as version-controlled SQL files in a Git repository (e.g., using a tool like Sqitch or just plain DDL scripts). This makes your data contracts reviewable, versionable, and auditable.
Automate Deployment with CI/CD: Create a CI/CD pipeline (e.g., using GitHub Actions) that automatically applies schema migrations to your database when changes are merged. This ensures that contract updates are deployed in a controlled and predictable manner.
Use dbt for What It's Best At: Let dbt handle complex, multi-table transformations and business logic validation. While Postgres can enforce that a user_id exists, a dbt test is better suited to check for more complex issues, like whether every active user has a corresponding record in a subscriptions table. The two tools are complementary.
This workflow treats your database schema with the same rigor as your application code, making your Postgres data contracts a core, reliable part of your infrastructure.
Addressing Common Concerns and Trade-offs
Adopting this database-first approach does come with considerations.
Performance Overhead: Yes, constraints add a small amount of overhead to write operations. However, for most analytics use cases, this is negligible compared to the cost of cleaning up bad data. It's a trade-off between write performance and data integrity, and we argue for prioritizing integrity.
Schema Evolution: Evolving a contract (e.g., adding a new status to a CHECK constraint) requires a formal migration (ALTER TABLE). This is a feature, not a bug. It forces a deliberate, version-controlled process for making changes, preventing the ad-hoc schema drift that causes so many data quality problems.
Developer Experience: Writing DDL and migration scripts can feel more cumbersome than adding a dbt test. However, the long-term benefit of universal, proactive enforcement provides a far better experience for the entire organization by building a foundation of trust and reliability in your data.
Conclusion: Build on a Foundation of Trust
dbt is an indispensable tool for the modern data stack, but it shouldn't be the foundation of your data quality strategy. By grounding your data contracts in the database itself, you create a system that is proactive, universal, and fundamentally more reliable.
Placing your data contracts in Postgres shifts your team's posture from constantly firefighting data issues to building new products and insights with confidence. The database becomes an active participant in guaranteeing data quality, not just a passive receptacle for bits and bytes.
Ready to increase the reliability of your data? Start small. Pick one of your most critical data models, identify a key business rule, and implement it as a CHECK constraint in Postgres. See for yourself how moving enforcement to the database can create a powerful, lasting foundation for data trust.