We Replaced Our Kafka Cluster with a Single Postgres Table
It sounds like heresy in modern data engineering circles, but it’s true. In a move that defied conventional tech wisdom, we replaced our Kafka cluster with a single Postgres table, and our entire system became simpler, cheaper, and more reliable. For years, the default answer to any asynchronous processing or event streaming problem has been "use Kafka." But what if the operational overhead, complexity, and cost of a distributed log system are overkill for your actual needs?
We found ourselves drowning in the complexity of managing Kafka brokers, Zookeeper coordination (or KRaft), and JVM tuning, all for a workload that, while critical, didn't require web-scale throughput. This is the story of how we traded a complex distributed system for a battle-tested relational database and rediscovered the power of simplicity. If you're managing a data pipeline that feels over-engineered, this journey might resonate with you.
The Allure and Agony of Apache Kafka
Let's be clear: Apache Kafka is a phenomenal piece of engineering. For high-throughput, real-time data feeds, event sourcing, and acting as a central nervous system for a sprawling microservices architecture, it has few rivals. Its ability to handle millions of messages per second with persistence and replayability is why it powers some of the largest companies in the world.
However, this power comes at a steep price. Our team, like many, experienced the "hidden" costs of running Kafka:
- Operational Complexity: Our on-call engineers spent countless hours dealing with broker failures, partition rebalancing, and the dreaded Zookeeper-is-down alerts. The mental overhead of managing a separate, complex distributed system was a constant drain on our resources.
- Steep Learning Curve: Onboarding new developers required a deep dive into Kafka-specific concepts: topics, partitions, consumer groups, offsets, and idempotency. This slowed down our ability to deliver features.
- Resource Consumption: A production-ready Kafka cluster is resource-intensive, demanding significant CPU and memory, which translates directly to higher cloud infrastructure bills.
We realized our use case—a background job queue for processing tasks like sending emails, generating reports, and processing uploads—didn't justify this complexity. We needed durability and reliability, but not necessarily infinite, replayable logs at a massive scale. This realization led us to reconsider a tool we already knew and trusted: PostgreSQL.
Why Postgres Became Our Kafka Alternative
Choosing to replace Kafka with Postgres wasn't about finding a feature-for-feature equivalent. It was about leveraging the inherent strengths of a relational database to solve our specific problem more effectively. Postgres isn’t just a database; it’s a mature, extensible data platform with decades of development behind it.
Leveraging ACID for Unbreakable Guarantees
The most compelling reason for the switch was Postgres's support for ACID transactions (Atomicity, Consistency, Isolation, Durability). With Kafka, you often grapple with delivery semantics like "at-least-once" or "exactly-once," which require careful client-side implementation.
With Postgres, this problem vanishes. A job-processing workflow becomes a single, atomic transaction:
- Begin transaction.
- Lock a job row.
- Perform the work.
- Update the job row to
completed. - Commit transaction.
If any step fails, the entire transaction is rolled back. The job row is never updated, and the lock is released, making it available for another worker to retry. There are no duplicate jobs, no lost messages—just simple, transactional integrity. This made our Postgres message queue incredibly robust.
The Power of Relational Tooling
By moving our queue into our primary database, we unlocked the entire ecosystem of relational tools we were already using.
- Observability: We could now write simple SQL queries to check the queue depth, inspect failed jobs, or analyze processing times. No special CLI tools or separate monitoring systems were needed.
- Data Integrity: We could use
FOREIGN KEYconstraints to link jobs to other business data, ensuring that a job to process a user's report couldn't exist without a valid user record. - Backups and Recovery: Our job queue was now included in our standard database backup and recovery procedures, managed by our existing Write-Ahead Logging (WAL) system.
The Technical Implementation: A Postgres-Powered Job Queue
Creating a robust job queue in Postgres is surprisingly straightforward. The core of this Kafka to Postgres migration centered on a single table and a clever locking mechanism.
Designing the Queue Table
Our jobs table was simple and effective. We used Postgres's powerful jsonb type for the payload, allowing for flexible job data without sacrificing indexing capabilities.
CREATE TYPE job_status AS ENUM ('pending', 'processing', 'completed', 'failed'); CREATE TABLE job_queue ( id bigserial PRIMARY KEY, payload jsonb NOT NULL, status job_status NOT NULL DEFAULT 'pending', retry_count integer NOT NULL DEFAULT 0, last_error text, created_at timestamptz NOT NULL DEFAULT now(), process_after timestamptz NOT NULL DEFAULT now(), locked_at timestamptz ); -- Index for fast polling CREATE INDEX idx_job_queue_fetch ON job_queue (process_after, created_at) WHERE status = 'pending';
The Magic of FOR UPDATE SKIP LOCKED
The key to building a high-performance, concurrent queue in Postgres is the SELECT ... FOR UPDATE SKIP LOCKED clause. This allows multiple worker processes to poll the table for jobs without blocking each other or attempting to process the same job twice.
Here’s how a worker finds and claims a job:
-- This query is executed inside a transaction WITH next_job AS ( SELECT id FROM job_queue WHERE status = 'pending' AND process_after <= now() ORDER BY created_at FOR UPDATE SKIP LOCKED LIMIT 1 ) UPDATE job_queue SET status = 'processing', locked_at = now() WHERE id = (SELECT id FROM next_job) RETURNING *; -- Returns the locked job to the worker
This atomic operation finds the oldest pending job, locks it so no other worker can see it, updates its status to "processing," and returns it to the worker. It’s a beautifully simple and effective pattern for using Postgres as a message broker.
The Results: Simplicity, Cost Savings, and Peace of Mind
The impact of this change was immediate and profound.
- Cost Reduction: We decommissioned our 3-node managed Kafka cluster, reducing our monthly infrastructure bill for this pipeline by over 70%.
- Operational Simplicity: Our on-call alerts for the data pipeline dropped by 90%. Our Postgres instance was already managed and monitored by our DBAs, so there was no new system to learn or maintain.
- Increased Developer Velocity: New engineers could understand the entire workflow by looking at a single SQL table. Debugging became as simple as running a
SELECTstatement.
When You Shouldn't Replace Kafka with Postgres
This approach is not a silver bullet. Acknowledging the limitations is crucial for making the right architectural decision. Postgres is not a suitable Kafka replacement for every scenario.
You should stick with Kafka if you need:
- Extreme Write Throughput: If you're ingesting hundreds of thousands or millions of events per second, Kafka's log-structured architecture will vastly outperform a relational database.
- True Event Sourcing: Kafka's immutable, replayable log is a core tenet of event sourcing. A database table can't easily replicate that "point-in-time" replayability.
- A Decoupled Data Backbone: When you need a central bus to stream data between dozens of disparate teams and services, Kafka's publisher/subscriber model provides superior decoupling.
Conclusion: Choose the Right Tool for the Job
Our decision to replace our Kafka cluster with a single Postgres table was a powerful reminder to question default choices and critically evaluate our actual needs. For our moderate-throughput background job system, the transactional guarantees, operational simplicity, and cost savings offered by Postgres far outweighed the raw power of Kafka.
Before you spin up your next distributed messaging system, ask yourself: Do you truly need the complexity that comes with it? You might find that the powerful, reliable database you're already using is more than capable of doing the job.
What are your experiences with over-engineered systems? Share your thoughts in the comments below!

Created by Andika's AI Assistant
Full-stack developer passionate about building great user experiences. Writing about web development, React, and everything in between.
