We Replaced Our Datadog Bill with a ClickHouse Cluster
The monthly invoice arrives like clockwork, and with it, a familiar sense of dread. For many engineering teams, the Datadog bill is a line item that grows relentlessly, often outpacing the growth of the business itself. We were no different. The value was undeniable, but the cost was becoming untenable. That's when we made a bold decision: we replaced our Datadog bill with a ClickHouse cluster, and in doing so, we not only slashed our costs but also gained unprecedented control and performance over our observability data.
This isn't just another story about cost savings; it's about reclaiming ownership of your most critical data. If you're feeling the squeeze from your SaaS observability provider, this is the journey we took to build a powerful, cost-effective alternative.
The Breaking Point: Why Our Datadog Bill Became Untenable
Datadog is an exceptional product. It provides a seamless, unified view of metrics, traces, and logs. But this convenience comes at a premium, priced on vectors that are designed to scale with your infrastructure. Our primary pain points were:
- Exponential Cost Scaling: The pricing model, based on hosts, ingested gigabytes, and custom metrics, created a perfect storm. Every new microservice, every increase in log verbosity, and every new metric directly translated to a higher bill. We found ourselves making engineering decisions based on cost implications rather than observability needs.
- Data Retention Trade-offs: To keep costs manageable, we were forced into short retention windows. Need to investigate an incident from 45 days ago? That data was likely already gone, archived to a cold, inaccessible storage tier. This severely limited our ability to perform long-term trend analysis or forensic investigations.
- Limited Querying Power: While Datadog's query language is user-friendly for common use cases, we often hit its limits when trying to run complex analytical queries. We wanted to join different data sets or run sophisticated aggregations that the platform simply wasn't built for. The platform is an observability tool, not a dedicated analytical database.
We realized we were paying a premium for a managed service that was forcing us to compromise on our core observability goals. The alternative was clear: build a system where we could store everything, retain it for as long as we needed, and query it in any way we imagined.
Why ClickHouse? The Case for a DIY Observability Stack
When we started exploring alternatives to Datadog, our search led us to ClickHouse. Developed by Yandex, ClickHouse is an open-source, columnar OLAP database designed for blistering speed on analytical queries. It's not a direct replacement for Datadog; it's the powerful engine around which you can build a custom observability platform.
Here’s why ClickHouse was the perfect fit:
- Incredible Performance: ClickHouse can scan billions of rows in milliseconds. Its columnar storage format means it only reads the data required for a specific query, making it incredibly efficient for the wide, sparse datasets typical of logs and metrics.
- Amazing Data Compression: Columnar data compresses exceptionally well. We saw compression ratios of 5-10x for our log data, drastically reducing storage costs compared to traditional row-based databases or even raw text files.
- SQL-based Querying: It uses a familiar SQL dialect, empowering anyone on the team to write complex queries without learning a proprietary language. This democratized data access beyond the SRE team.
Key Components of Our New Stack
Our replacement for the Datadog ecosystem consists of three core, open-source components:
- Data Collection (The Agent): We use Vector as our high-performance agent. It runs on our hosts, collects logs and metrics, and reliably forwards them to our ClickHouse cluster. Its powerful transformation capabilities allow us to parse, enrich, and sample data at the edge before it's even ingested.
- Data Storage & Querying (The Engine): Our self-hosted ClickHouse cluster serves as the central repository for all logs, metrics, and traces. This is the heart of our new system.
- Visualization & Alerting (The UI): Grafana provides the user interface. Using its native ClickHouse data source, we rebuilt our dashboards and configured a more flexible and powerful alerting system.
The Migration Process: A Phased Rollout
Migrating from a fully integrated platform like Datadog to a self-hosted ClickHouse cluster is a significant undertaking, but we approached it systematically. We didn't try to boil the ocean. Instead, we started with a single, high-volume data source: application logs.
Architecting the Ingestion Pipeline
The first step was getting data from our servers into ClickHouse. We deployed Vector as a DaemonSet in our Kubernetes clusters. A simple Vector configuration was all it took to start tailing log files and shipping them.
Here's a simplified example of a vector.toml configuration:
# vector.toml [sources.kubernetes_logs] type = "kubernetes_logs" [transforms.parsed_logs] type = "remap" inputs = ["kubernetes_logs"] source = ''' . = parse_json!(.message) .timestamp = now() ''' [sinks.clickhouse] type = "clickhouse" inputs = ["parsed_logs"] endpoint = "http://your-clickhouse-host:8123" database = "logs" table = "logs_raw" # Batching for performance batch_size = 1048576 batch_timeout_secs = 10
In ClickHouse, we created a table using the powerful MergeTree engine, optimized for time-series data. We partitioned the table by day and ordered it by timestamp to ensure lightning-fast queries for recent data.
Recreating Dashboards and Alerts
With data flowing in, the next phase was visualization. We connected Grafana to our ClickHouse instance and began the process of recreating our most critical dashboards. While this took time, the payoff was immense. We were no longer limited by Datadog's UI. We could write complex SQL queries directly in Grafana to build charts that were previously impossible, correlating data across different services in a single view.
The Results: Massive Cost Savings and Unlocked Potential
The impact of this migration was immediate and profound. Let's talk numbers.
- Total Cost of Ownership (TCO): Our monthly Datadog bill was averaging $18,000. Our new stack, including the cost of EC2 instances for the ClickHouse cluster and S3 for long-term backups, costs us approximately $3,000 per month. That's an 83% reduction in cost.
- Query Performance: Queries that used to take 10-20 seconds in the Datadog UI now return in under a second in Grafana, even when scanning terabytes of data.
- Data Ownership: We now have a 1-year hot retention policy for all logs, with indefinite cold storage in S3. We are no longer afraid to log verbosely or add high-cardinality metrics. We own our data, completely.
By moving to a ClickHouse-based observability solution, we transformed observability from a cost center into a strategic asset.
Is This Approach Right for You?
Replacing Datadog with ClickHouse is not a silver bullet. It's a trade-off. You are exchanging money for engineering time and operational responsibility.
This solution is a great fit if:
- Your observability bill is a significant line item in your budget.
- You have an in-house SRE or DevOps team comfortable with managing stateful infrastructure.
- You frequently hit the limits of your current provider's query capabilities.
- Data ownership and long-term retention are critical business requirements.
You might want to stick with a SaaS provider if:
- You are a small team and prefer to focus exclusively on product development.
- You want a fully managed, out-of-the-box solution with enterprise support.
- The operational overhead of managing a database cluster is not something you're prepared to take on.
Conclusion: Take Back Control of Your Data
Our journey to replace Datadog with a ClickHouse cluster was challenging but ultimately transformative. We didn't just find a cheaper alternative; we built a superior platform tailored to our specific needs. We now have a faster, more flexible, and infinitely more scalable system at a fraction of the cost.
If you're staring down another five-figure observability bill, it's time to ask yourself if there's a better way. The open-source ecosystem, powered by tools like ClickHouse, Vector, and Grafana, offers a compelling path to not only save money but to build a truly best-in-class observability stack.
Are you considering a similar migration away from a SaaS provider? Share your thoughts and questions in the comments below

Created by Andika's AI Assistant
Full-stack developer passionate about building great user experiences. Writing about web development, React, and everything in between.
