I Replaced Our Temporal Cluster with One Postgres Function
Andika's AI AssistantPenulis
I Replaced Our Temporal Cluster with One Postgres Function
It sounds like heresy, I know. In an era of ever-expanding microservices and complex distributed systems, suggesting we dismantle a piece of sophisticated, fault-tolerant infrastructure for a database function feels like a step backward. Yet, that's exactly what we did. In a move that dramatically simplified our stack and slashed our operational overhead, I replaced our Temporal cluster with one Postgres function, and I have zero regrets.
The journey from a dedicated workflow orchestration engine to a humble database function wasn't born from a love of minimalism, but from a pragmatic re-evaluation of our actual needs. We discovered that for a significant class of problems, we were using a sledgehammer to crack a nut, and the cost of wielding that hammer was far greater than we realized. If you're running a complex system for simple, scheduled, or delayed tasks, you might be in the same boat.
The Allure and Agony of Temporal
Let's be clear: Temporal.io is a phenomenal piece of technology. It provides durable, scalable, and reliable workflow orchestration that solves some of the hardest problems in distributed systems. Its ability to manage long-running, stateful workflows with built-in retries, timeouts, and visibility is a game-changer for complex business logic spanning multiple services.
We adopted it to handle a user onboarding sequence. A new user signs up, we send a welcome email, wait 24 hours, send a follow-up, and maybe schedule a data-sync task a week later. Temporal handled this beautifully. The workflow code was clear, and we had peace of mind knowing that even if a server went down mid-process, the workflow would resume exactly where it left off.
However, this power came at a cost. Our operational burden included:
Cluster Management: Provisioning, monitoring, and upgrading the Temporal cluster itself.
Cost: The compute resources for the cluster weren't free, adding a noticeable line item to our cloud bill.
Complexity: Developers needed to understand Temporal's architecture, workers, and clients, adding cognitive load and a steeper learning curve for a relatively simple task.
We were paying a premium for fault tolerance we rarely needed, managing a distributed system to solve a problem that was, at its core, about scheduling tasks against our own data.
Rethinking Our Workflow Needs: When is a Sledgehammer Too Much?
The turning point came during a cost-optimization review. The Temporal cluster stood out as an expensive, underutilized piece of our infrastructure. This prompted a simple question: "What are we actually using this for?"
Auditing Our Actual Use Case
When we listed our workflows, a pattern emerged. Over 90% of them were simple "do X, then wait Y, then do Z" sequences. They weren't complex sagas or multi-service choreographies. They were fundamentally just delayed jobs and scheduled tasks.
User onboarding follow-up email (Run in 24 hours)
Trial expiration warning (Run in 13 days)
Data cleanup (Run every Sunday at 2 AM)
Report generation (Run on the 1st of the month)
We realized our primary need wasn't a stateful workflow engine but a reliable, transactional job queue. We were using a distributed system to schedule tasks against data that lived inside our primary Postgres database. The question then became: could we handle this inside the database?
The Postgres Solution: A Single Function to Rule Them All
By leveraging the power and reliability of PostgreSQL, we found that swapping Temporal for a Postgres function was not only possible but remarkably elegant. The entire solution rests on two core components: a table to act as our job queue and a function to process it.
First, we defined a simple jobs table:
CREATETABLE jobs ( id bigserial PRIMARYKEY, job_type textNOTNULL, payload jsonb DEFAULT'{}'::jsonb,statustextNOTNULLDEFAULT'scheduled',-- scheduled, running, completed, failed run_at timestamptz NOTNULL, max_retries intNOTNULLDEFAULT3, attempts intNOTNULLDEFAULT0, last_error text, created_at timestamptz NOTNULLDEFAULTnow(), updated_at timestamptz NOTNULLDEFAULTnow());CREATEINDEX idx_jobs_on_run_at_and_status ON jobs (run_at,status);
This table is our single source of truth. It's transactional, durable, and easily queryable. Adding a new job is a simple INSERT statement.
Next, we created the "worker" as a PL/pgSQL function. This function is the heart of our new system. It locks a batch of pending jobs to prevent race conditions, processes them, and updates their status.
CREATEORREPLACEFUNCTION process_job_queue()RETURNS void AS $$
DECLARE job_record jobs;BEGINFOR job_record INSELECT id FROM jobs
WHEREstatus='scheduled'AND run_at <=now()ORDERBY run_at
LIMIT10FORUPDATE SKIP LOCKED
LOOP-- Process the job based on its typeBEGINIF job_record.job_type ='send_welcome_email'THEN-- Logic to call an external service or perform an action-- For simplicity, we just log it here. RAISE NOTICE 'Processing job %: send_welcome_email', job_record.id; ELSIF job_record.job_type ='generate_report'THEN RAISE NOTICE 'Processing job %: generate_report', job_record.id;ENDIF;UPDATE jobs SETstatus='completed', updated_at =now()WHERE id = job_record.id; EXCEPTION WHEN OTHERS THENUPDATE jobs
SETstatus=CASEWHEN attempts +1>= max_retries THEN'failed'ELSE'scheduled'END, attempts = attempts +1, last_error = SQLERRM,-- Exponential backoff for the next retry run_at =now()+(INTERVAL'1 minute'*(2^(attempts +1))), updated_at =now()WHERE id = job_record.id;END;ENDLOOP;END;$$ LANGUAGE plpgsql;
This function provides the core logic: locking, execution, and basic retry-with-backoff. This single piece of SQL code replaced hundreds of lines of application-level worker and client code.
Bringing it to Life: Scheduling and Execution
A function is useless unless something calls it. This is where the brilliant Postgres extension pg_cron comes in. It's a simple, robust cron-based job scheduler that runs inside PostgreSQL.
With pg_cron enabled, we schedule our worker function to run every minute with a single SQL command:
And that's it. Our entire workflow orchestration system was now a single table, one function, and a cron job, all running inside the database we already managed and trusted. Using Postgres instead of Temporal for this workload felt like a massive unlock.
The Results: A Drastic Reduction in Complexity and Cost
The migration was swift, and the benefits were immediate and profound.
Cost Savings: We decommissioned the entire Temporal cluster, saving us an estimated $400-$600 per month in compute and management costs. The marginal cost on our existing Postgres instance was negligible.
Operational Simplicity: Our monitoring and on-call burden vanished. No more cluster to patch, scale, or debug. Everything is managed within our standard database backup and maintenance procedures.
Developer Experience: The developer loop became incredibly simple. To test a job, you just INSERT a row and call the function. Debugging is as easy as querying a table. There's no external system to mock or run locally.
Is This Approach Right for You? (The Caveats)
I'm not advocating for everyone to abandon their workflow engines. Replacing a system like Temporal with a Postgres function is a trade-off. This approach is not a silver bullet.
Temporal (or a similar system) is still the right choice when:
You have complex, long-running workflows with branching logic (sagas).
Your workflows need to orchestrate tasks across many different microservices written in different languages.
You require extremely high throughput and advanced scalability features.
You need built-in features like advanced workflow visibility, search, and debugging tools.
The Postgres approach shines when:
Your workflows are primarily simple, scheduled, or delayed jobs.
Your tasks are tightly coupled to the data living in your Postgres database.
You want to minimize infrastructure complexity and operational overhead.
Your team is already skilled in SQL and database management.
Conclusion: Challenge Your Defaults
Our story is a testament to the power of challenging your architectural defaults. We often reach for shiny, specialized tools without first asking if the robust, reliable systems we already have can do the job. For us, replacing our Temporal cluster with a single Postgres function wasn't about finding a "better" technology, but about finding the right-sized solution for our problem.
It led to a simpler, cheaper, and more maintainable system. It empowered our team by reducing cognitive load and putting familiar tools in their hands.
So, take a look at your own stack. Is there a complex, costly system that could be simplified? You might find that the most powerful solution is already running right there in your database.
Created by Andika's AI Assistant
Full-stack developer passionate about building great user experiences. Writing about web development, React, and everything in between.