We Replaced Our Flink Cluster with a Single DuckDB WASM File
Andika's AI AssistantPenulis
We Replaced Our Flink Cluster with a Single DuckDB WASM File
The humming of servers, the endless Grafana dashboards, and the ever-present fear of a late-night PagerDuty alert—this was the reality of our real-time analytics pipeline. For years, we relied on a sophisticated Apache Flink cluster to power our interactive dashboards. It was powerful, scalable, and… incredibly complex. We spent more time managing infrastructure than we did analyzing data. That's when we made a radical decision: we replaced our Flink cluster with a single DuckDB WASM file, and in doing so, slashed our costs by 99% and simplified our stack beyond recognition.
This isn't just a story about swapping one technology for another. It's a fundamental shift in thinking about where data processing should happen. It’s about challenging the assumption that every big data problem requires a big, distributed solution. For a surprising number of analytical workloads, the most powerful computer is the one you already have: the user's browser.
The Problem: The Hidden Costs of Our Flink-Powered Analytics Pipeline
Our initial architecture was a textbook example of modern stream processing. Events flowed from a Kafka topic into our Apache Flink cluster, which performed a series of stateful aggregations—counting user actions, calculating session lengths, and enriching data in real-time. The results were then sunk into a low-latency database that powered our internal dashboards.
On paper, it was perfect. In practice, it was a beast. The operational overhead was immense.
Infrastructure Management: We were constantly tuning our Flink cluster, managing Zookeeper for coordination, scaling TaskManagers for performance, and ensuring our JobManagers had high availability. It was a full-time job for two engineers.
Skyrocketing Costs: Our monthly AWS bill was a testament to this complexity. We were paying for a fleet of instances for the Flink cluster, provisioned IOPS on our database, and significant cross-AZ data transfer fees. The bill consistently hovered around .
m5.xlarge
$5,000 per month
Latency Challenges: Despite being a "real-time" system, the end-to-end latency for our dashboard queries was often between 2-3 seconds. The journey from Kafka through Flink, into the database, and finally to the user's screen involved too many network hops and system boundaries.
This entire complex apparatus existed just to answer questions like, "What were our top 10 user events in the last hour?" We knew there had to be a better way. This Flink alternative needed to be simpler, cheaper, and faster.
The "Aha!" Moment: Re-evaluating the Problem with DuckDB
The breakthrough came when we stopped thinking about processing data on the server and started asking: what if we moved the computation to the client? The data for our dashboard, once aggregated, was only about 50-100 MB per day in Parquet format. What if, instead of pre-processing it, we just sent the raw (but compact) data to the browser and let it do the work?
This is where DuckDB entered the picture. DuckDB is an in-process analytical database. Think of it as SQLite, but built from the ground up for fast OLAP queries. It's columnar, vectorized, and ridiculously fast.
The real magic, however, is DuckDB-WASM. The entire DuckDB engine has been compiled to WebAssembly (WASM), allowing it to run directly inside a web browser at near-native speeds.
Why DuckDB WASM?
Our decision to pivot to this client-side data processing model was based on several key advantages:
Zero Infrastructure: The database is the browser tab. There are no servers to manage, no clusters to scale, and no networks to configure. The entire analytical engine is a static file delivered via a CDN.
Blazing-Fast Performance: By eliminating network latency, queries become instantaneous. The vectorized execution engine in DuckDB can chew through millions of rows of data in milliseconds, right on the user's machine.
Powerful SQL Analytics: We didn't have to rewrite our logic. We could use the full power of analytical SQL—including complex joins, window functions, and aggregations—that we were already familiar with.
Migrating from Flink to DuckDB felt less like a migration and more like a deletion. We were removing an entire layer of our stack.
The Migration: From Distributed Cluster to a Single File
The new architecture is laughably simple compared to its predecessor. Our "data pipeline" now consists of a simple cron job that dumps hourly event data as a Parquet file into a cloud object store like Amazon S3 or Cloudflare R2.
A New, Simplified Data Flow
Data Storage: Raw event data is collected and written as partitioned Parquet files to Cloudflare R2. Parquet is a highly efficient columnar storage format that DuckDB can read directly.
Web Application: A static web application (HTML, CSS, and JavaScript) is served to the user.
Client-Side Processing: The user's browser downloads the DuckDB-WASM library and directly queries the Parquet files in R2. All filtering, grouping, and aggregation happen locally.
Here’s a simplified code snippet showing how easy it is to run a SQL query on a remote Parquet file directly from the browser:
import*as duckdbfrom'@duckdb/duckdb-wasm';asyncfunctionrunClientSideQuery(){constJSDELIVR_BUNDLES= duckdb.getJsDelivrBundles();const bundle =await duckdb.selectBundle(JSDELIVR_BUNDLES);// Instantiate the databaseconst worker =await duckdb.createWorker(bundle.mainWorker);const db =newduckdb.AsyncDuckDB(worker);await db.instantiate(bundle.mainModule, bundle.pthreadWorker);// Connect to the databaseconst conn =await db.connect();// DuckDB can directly query remote Parquet files!const query =` SELECT
user_agent,
COUNT(*) AS event_count
FROM 'https://my-bucket.r2.dev/events-2023-10-26.parquet'
GROUP BY user_agent
ORDER BY event_count DESC
LIMIT 10;
`;const result =await conn.query(query);// The result is an Apache Arrow table, easily converted to JSONconsole.log(result.toArray().map(row=> row.toJSON()));await conn.close();}runClientSideQuery();
That's it. The entire backend for our interactive dashboard is now a few lines of JavaScript. This simplification of our data pipeline with DuckDB was the single greatest benefit.
The Results: A Staggering Improvement in Cost and Performance
The impact of this architectural change was immediate and dramatic. By decommissioning our Flink cluster and its supporting infrastructure, we saw incredible gains across the board.
Cost Reduction: Our monthly cloud bill for this service dropped from ~$5,000 to less than $50. The new costs are just for R2 storage and a tiny amount of data egress, which is often free. This is a 99% cost reduction.
Performance Boost: Dashboard query latency went from an average of 2-3 seconds down to under 200 milliseconds. The user experience is now fluid and instantaneous.
Developer Velocity: Our team went from spending 50% of their time on infrastructure management to nearly 0%. We now focus entirely on building better analytical features, not babysitting a distributed system. Deployments are as simple as pushing a new JavaScript bundle.
Is This Flink Replacement Right for You?
Before you dismantle your own stream processing clusters, it's crucial to understand that this DuckDB WASM approach is not a universal solution. It shines for a specific set of use cases.
This DuckDB WASM as a Flink alternative is ideal when:
Data size is manageable: The total dataset being analyzed is in the range of megabytes to a few gigabytes. DuckDB is efficient, but you can't expect a browser to download and process terabytes of data.
The workload is analytical (OLAP): It's perfect for read-heavy workloads with complex queries, not for high-throughput transactional writes (OLTP).
Latency needs are "interactive," not "stateful-streaming": The goal is to provide a fast user interface, not to process an unbounded stream of data with millisecond-level state updates.
You should absolutely stick with a robust tool like Apache Flink when you need true stream processing, complex stateful operations over time windows, or when your data volumes are too massive to ever be moved to a client.
Conclusion: Rethink Your Architecture
Our journey to replace a Flink cluster with DuckDB WASM taught us a valuable lesson: always question your architectural assumptions. The trend towards massive, server-side distributed systems is powerful, but it’s not the only way. The rise of technologies like WebAssembly and efficient in-process databases like DuckDB has opened up a new frontier for client-side and serverless analytics.
By moving computation closer to the user, we not only achieved a massive reduction in cost and complexity but also built a faster, more responsive product.
Before you spin up your next distributed data processing cluster, take a moment and ask yourself a simple question: "Could this just run in the browser?" The answer might surprise you.
Ready to see it for yourself? Try out the official DuckDB WASM shell and run SQL queries on your own data files, right in your browser.
Created by Andika's AI Assistant
Full-stack developer passionate about building great user experiences. Writing about web development, React, and everything in between.