Key Cost Drivers in Databricks π§
Databricks pricing primarily depends on:
Data volume (especially for ETL)
Concurrency and session duration (number of users Γ usage time)
Warehouse type and size (Standard vs Pro vs Serverless)
Workload type:
Jobs Compute: for scheduled pipelines (ETL)
SQL Compute: for BI dashboards (like Astrato)
All-purpose compute: notebooks and ad hoc exploration
Auto-scaling and auto-termination settings
Rules of Thumb π
ETL / Data Engineering (Jobs Compute)
Small Cluster (1 node) - ~50GB/day
Estimated DBU/hr: ~2β3 | Cost: $1β3/hr
Medium Cluster (2β4 nodes) - ~500GB/day
Estimated DBU/hr: ~6β10 | Cost: $5β10/hr
Large Cluster (>4 nodes) - ~1TB/day
Estimated DBU/hr: ~12+ | Cost: $10β20/hr
π‘Tip: Most ETL jobs are batch-based, running 1β3 hours per day.
BI App Usage (SQL Compute) with Astrato β Optimized Costs π
Light Usage
5 users, 1M rows (~2GB/query)
Usage: 2β3 hrs/day
Cluster: 1β2 node SQL warehouse
Est. Monthly Cost: $80β200/mo
Moderate Usage
20 users, 10M rows (~10β20GB)
Usage: 8 hrs/day
Cluster: 2β4 node SQL warehouse
Est. Monthly Cost: $400β900/mo
Heavy Usage
50 users, 50β100M rows (~100GB)
Usage: 8β10 hrs/day
Cluster: 4β8 node SQL warehouse
Est. Monthly Cost: $1,200β3,000/mo
Assumes Serverless SQL, result caching, auto-pause, and lean warehouse config. β
Astratoβs zero-copy architecture means compute is aligned with actual usage β no background refresh costs.
Metrics That Matter for BI Costing π―
Beyond data volume and concurrent users, key cost drivers include:
Query frequency and complexity (joins, filters, aggregation depth) π
Caching utilization (Databricks SQL cache can reduce costs ~30β60%) πΎ
Data freshness requirements (real-time = more compute than hourly/daily updates)β
User interaction level (passive consumption vs heavy exploration) π§
Auto-pause / resume thresholds (shorter = lower cost) π€
Concurrent query load (peak concurrency = warehouse size) π§©
Example Scenarios excluding caching π
Example 1: Light Embedded BI in SaaS App β
10 external users log in 1β2Γ/day
Each runs 3β5 simple queries on ~5β10M row dataset
SQL Compute: 2-node Serverless warehouse with auto-pause
Monthly BI Cost: ~$100β200
ETL Pipelines: Run nightly on 50GB β ~$300/month
β Ideal for MVP or early-stage analytics
Example 2: Enterprise Internal BI Platform β
50 internal users, ~20 concurrently active during peak
Datasets of 100M rows, moderate joins
SQL Compute: 4β6 node Serverless with caching, aggressive auto-pause
Monthly BI Cost: $500β1,200
ETL: Runs hourly on ~1TB β $1,500β2,500/month
β Modern enterprise BI with high interactivity and governance
Additional Advice β
Databricks SQL Serverless is ideal for Astratoβs live-query, low-latency use cases
BI Cost β direct function of dashboard usage & business value - no usage = no cost, usage = cost = business value
Leverage dashboard telemetry in Astrato to right-size warehouse sizes
Split workloads: Custom Reports may need separate tuning from dashboards