An independent case study

Databricks: the lakehouse that became an AI platform

A neutral, evidence-first reading of the data + AI company now at a $5.4B revenue run-rate and a $134B private valuation — assembled from company disclosures, primary filings and independent analysts so you can reach your own conclusion.

55 sourcesAs of June 20268 analysis sections

In a little over a decade, seven Berkeley researchers who built Apache Spark turned an open-source engine into the company that coined the lakehouse — and then, on the back of generative AI, into the most valuable private enterprise-software company in the world at $134 billion.

The genuinely open question is not whether Databricks is impressive — a $5.4B run-rate growing >65% with positive cash flow speaks for itself[4]. It is whether a consumption business with compressing margins can sustain a software-multiple valuation while open formats commoditize its moat, hyperscalers bundle against it, and an AI cycle its own CEO calls a bubble cools. The evidence cuts both ways on every question below. This study lays out both cases; the verdict is yours.

The decisive questions

Each links to the section that lays out the evidence on both sides.

The climb that frames the debate

Company-disclosed revenue run-rates (US$B; annualized, not audited GAAP). The recent slope — $1.6B for FY2024 to a $5.4B run-rate two years later — is what both the bull and bear cases argue over.

Revenue run-rate (US$B, company-disclosed)
FY2020212022FY24FY25eSep'25Dec'25Feb'26
⚖️
What reasonable people disagree about
Whether a ~$5.4B annualized run-rate justifies a $134B mark at over 2× Snowflake[46]; whether the open-format moat survives Iceberg's rise[33]; whether AI revenue is durable or bubble-exposed[47]; and whether staying private this long serves employees or mainly insiders[40]. Informed observers land in different places — by design, this study does not pick for you.

How to read this

Eight sections, each built the same way: a neutral synthesis, framework visuals, a two-sided case-for / case-against ledger, dated quotes, and the sources used. Start with the question that interests you, or read in order from Overview & Timeline.

🔍
Independent research artifact, not affiliated with or endorsed by Databricks. Databricks is private and does not publish audited financials; revenue, margin and valuation figures here are clearly-labeled run-rates, estimates or negotiated round marks. Where the research could not verify a claim, the relevant section says so. See Methodology & Limits.
Overview & Timeline

From Apache Spark to the Data Intelligence Platform

What Databricks does, how it got here, and the scale it operates at today.

Founded 2013San Francisco15,000+ customers

Databricks turned an open-source engine (Apache Spark) into a platform thesis (the lakehouse) and then into an AI company — reaching 15,000+ customers[2] and a $5.4B revenue run-rate[4] while still private. The same acquisition-led speed that built the platform is also its sharpest execution risk.

What it does

Databricks sells a single cloud platform for storing, processing, governing and building on enterprise data — spanning data engineering, data warehousing (Databricks SQL), data science/ML, and, increasingly, generative AI. Its founding bet was the lakehouse: rather than copy data between a cheap-but-unstructured data lake and an expensive-but-governed data warehouse, run both on one set of open, direct-access files (Delta Lake / Parquet)[28]. The platform runs on all three major clouds and is sold as a first-party Microsoft service on Azure[30].

The company grew out of the UC Berkeley AMPLab, where its founders created Apache Spark; Spark, Delta Lake, MLflow and Unity Catalog were all open-sourced, and that open-source distribution is central to how Databricks seeds adoption[1].

How it got here

2013

Founded out of the UC Berkeley AMPLab by the seven creators of Apache Spark; Ali Ghodsi later becomes CEO.[1]

2019

Open-sources Delta Lake; raises a $400M Series F at a $6.2B valuation.[1]

2021

Coins and formalizes the 'lakehouse' architecture in a CIDR research paper; raises to a $38B valuation.[28]

2023

Acquires MosaicML for $1.3B, entering generative-AI model training (becomes Mosaic AI); $43B Series I.[5]

2024

Launches DBRX open LLM; acquires Tabular (Apache Iceberg's creators); $10B Series J at $62B.[7]

2025

Acquires Neon ($1B) for Lakebase; signs Anthropic and OpenAI deals; Series K lifts the mark above $100B.[6]

2026

Reports a $5.4B revenue run-rate (>65% YoY) and a $1.4B AI run-rate; $134B Series L completes.[4]

Today's scale

As of February 2026 Databricks reported a $5.4B revenue run-rate growing >65% year over year, with a $1.4B AI-products run-rate, net revenue retention >140%, more than 800 customers at $1M+ and 70+ at $10M+ run-rate, and positive free cash flow over the trailing twelve months[4]. These are company disclosures, not audited figures (see Financials).

With this new capital, we'll double down on Lakebase so developers can create operational databases built for AI agents.
Ali Ghodsi · Co-founder & CEO, Databricks · Feb 2026 · source

What the trajectory shows

  • Durable platform expansion: from data engineering to SQL warehousing to AI, with growth accelerating at a $5B+ scale[4].
  • Open-source roots (Spark, Delta, MLflow, Unity Catalog) seed bottom-up adoption and a large installed base[1].
  • A credible AI pivot, anchored by the MosaicML acquisition and the DBRX open model[5][7].

What it leaves open

  • Roughly $3B+ of acquisitions (MosaicML, Tabular, Neon) creates integration debt and product overlap[8].
  • Scale figures are annualized run-rates from a private company, not audited revenue[4].
  • Each new pillar (warehousing, OLTP via Lakebase, agents) pushes Databricks into a rival's home turf[6].
Market & Industry

A large, fast-growing market — with blurring boundaries

Where Databricks sits in the cloud data and AI stack, how big that market is, and why the lines between its sub-markets are dissolving.

Databricks competes across several large, growing markets — cloud data warehousing (forecast to roughly quadruple to ~$49B by 2031[9]) plus data engineering, ML and now AI. But the category's edges are converging: lake and warehouse, and the open table formats beneath them, are merging, which both expands the opportunity and erodes the moats[12].

How big is the market?

There is no single clean number, because Databricks straddles several categories and the analyst estimates diverge widely. Three reference points, all clearly third-party:

  • Cloud data warehouse — forecast to grow from ~$11.8B (2025) to ~$49.1B by 2031 at a ~27% CAGR, with the four largest vendors (AWS, Microsoft, Google Cloud, Snowflake) ~68% of revenue[9].
  • Big-data analytics (broad) — estimated at ~$395B in 2025, projected to reach ~$1.18T by 2034 at a ~13% CAGR[10].
  • Data lakehouse (narrow) — estimates range enormously, from ~$4.75B (2025, ~11% CAGR) to ~$14.2B (2025, ~25% CAGR), a sign the category is still being defined[11].
📐
Treat the lakehouse-specific TAM figures with caution: they come from report-mill vendors, diverge by 3× on the base year and by more than 2× on the growth rate, and are not load-bearing here. The cloud-data-warehouse and big-data-analytics figures are more defensible directional anchors.

The structural shift: convergence

For a decade the industry split into two camps — data lakes (cheap object storage, flexible, hard to govern) and data warehouses (governed, fast SQL, expensive). Databricks' lakehouse thesis was that one open platform could do both. That thesis is now the consensus: warehouse-first Snowflake has added open Iceberg tables and a Polaris catalog, while lakehouse-first Databricks has built a $1B+ SQL warehousing business[21][3]. As formats become interchangeable, observers argue the competitive boundary has moved up to the catalog and governance layer, not the storage format[12].

Layered on top is the AI shift. Enterprises want to build models and agents on their own governed data, which is exactly the adjacency Databricks (and every rival) is racing into — expanding the market while intensifying the contest for it.

Why the market favors Databricks

  • Large, double-digit-growth markets in warehousing, analytics and AI give years of runway[9][10].
  • The lakehouse thesis Databricks pioneered is now the industry direction of travel[12].
  • The AI wave expands demand for governed, model-ready enterprise data — Databricks' core asset.

Why the market cuts against it

  • Convergence means rivals are entering Databricks' space as fast as it enters theirs[21].
  • Open table formats commoditize the storage layer, eroding lock-in[12].
  • The largest share of the warehouse market already sits with hyperscalers that can bundle[9].
Business Model

Consumption pricing, two-layer billing, compressing margins

How Databricks makes money, what the unit economics look like, and where the model is under pressure.

Databricks charges for consumption, not seats: customers buy compute in Databricks Units (DBUs) and pay the cloud provider separately for the underlying infrastructure[15]. That model drives >140% net retention[4] but also exposes Databricks to cloud and GPU costs — and its gross margin appears to be slipping from ~85% toward ~80%[13].

How the money works

Pricing is pay-as-you-go, billed per second, with Committed Use Contracts giving volume discounts that flex across clouds[14]. A DBU is a normalized unit of processing; different workloads carry different DBU rates (on AWS Premium, roughly $0.15/DBU for Jobs Compute up to $0.70/DBU for serverless SQL)[15]. Crucially, the customer pays two bills: one to Databricks for DBUs, and one to AWS/Azure/GCP for the VMs and storage underneath. That cloud-infrastructure layer routinely accounts for 50–70% of total Databricks spend[15]. On Azure, where Databricks is a first-party service, both appear on a single Microsoft bill[30].

Revenue mix (estimated)

Independent estimate of the ~$4.8B run-rate mix (late 2025). Databricks does not break out segment revenue; the AI and data-warehousing lines have each crossed a $1B run-rate per the company[3].

  • Estimated revenue mix (Sacra)
  • Core data platform60%
  • Data warehousing (SQL)20%
  • AI / Mosaic AI20%

Source: Sacra estimate[16]. Treat as directional, not disclosed.

The unit-economics tension

The expansion engine is genuinely strong: net revenue retention above 140% means existing customers spend ~40% more each year, and more than half use six or more products[4]. But the cost side is where the SaaS comparison breaks down. Real-time AI inference needs dedicated GPUs running continuously, so AI revenue carries a structurally lower margin than classic software; independent analysts put Databricks' gross margin near 74% on AI-heavy mix versus ~88% for pure-SaaS peers[18], and Sacra pegs the blended figure at ~80%, down from ~85%[13].

⚠️
Bill predictability is the most common customer complaint. Because total cost = DBUs + a separate cloud bill, FinOps write-ups warn teams can underestimate spend by 50–200% when they budget only the DBU charge[17], and interactive clusters can sit idle and waste a large share of spend[48].

Strengths of the model

  • Consumption pricing aligns cost with value and produces >140% net retention[4].
  • Land-and-expand across many products lifts revenue per customer and stickiness[16].
  • First-party Azure billing removes procurement friction for a huge enterprise base[30].

Pressures on the model

  • Gross margin is compressing toward ~80% (lower on AI) versus ~88% for pure software[13][18].
  • Two-layer billing exposes Databricks (and its customers) to cloud and GPU cost inflation[15].
  • Unpredictable consumption bills are a recurring source of customer friction[17].
Competitive Landscape

Fighting Snowflake — while renting from AWS, Azure and Google

Who Databricks competes with, where it sits in the market, and the structural awkwardness of depending on its own rivals.

Databricks' closest public comp is Snowflake, and the two are converging on each other's turf. But its most structurally awkward rivals are the hyperscalers it runs on — AWS, Azure and Google — and Microsoft Fabric, which bundles a competing stack through the same Azure that resells Databricks as a first-party service[23].

The field

Snowflake ($4.47B FY2026 product revenue, +29%, 125% net retention) is the head-to-head rival; warehouse-first to Databricks' lakehouse-first, it has added Iceberg tables, the Polaris catalog and Cortex AI to converge on the same ground[19][22]. Microsoft Fabric (21,000+ customers, 70%+ of the Fortune 500) bundles engineering, warehousing and Power BI[23]. Google BigQuery and AWS Redshift/SageMaker are both native-stack competitors and the clouds Databricks depends on. The 2025 Gartner Cloud DBMS Leaders quadrant alone holds nine vendors — AWS, Google, Microsoft, Oracle, Databricks, Snowflake, MongoDB, IBM and Alibaba Cloud[25] — a sign of how crowded the top of the market is.

Five Forces

Click a force to see the rating and the evidence behind it.

Cloud data + AI platforms
Competitive rivalryHigh. Snowflake ($4.47B FY26 product revenue, +29%), Microsoft Fabric (>21,000 customers, >70% of the Fortune 500), Google BigQuery and AWS Redshift all overlap Databricks; rivals are converging on the same lakehouse + AI surface. Gartner's Cloud DBMS Leaders quadrant alone holds nine vendors. [20][24][26]
🏚️
The supplier-power problem is the structural one. Databricks runs on the infrastructure of AWS, Azure and Google — companies that also sell competing data and AI platforms. Cloud and GPU costs are both a margin drag and a competitive lever held by its rivals[15].

Positioning

Two axes that actually separate this market: openness/multicloud (horizontal) and how AI/ML-native versus BI/warehouse-native a platform is (vertical). Hover a point for the basis.

Where the platforms sit
Single-cloud / proprietaryMulticloud / openBI / warehouse-nativeAI / ML-nativeDatabricksSnowflakeMicrosoft FabricGoogle BigQueryAWS RedshiftPalantir

Hover a point to see the basis for its placement.

Placement is the author's qualitative read of sourced positioning, not a quantitative score.

Databricks needs a compelling answer for why enterprises should choose them over Microsoft Fabric.
Sanjeev Mohan · Principal, SanjMo (industry analyst) · Dec 2025 · source

Where Databricks is winning

  • Out-growing Snowflake (>65% vs +29%) and taking warehouse share with a $1B+ SQL business[19][26].
  • A Gartner Leader in Cloud DBMS for five straight years, now scored on OLTP too[24].
  • Multicloud neutrality is a real selling point for enterprises avoiding single-cloud lock-in[30].

Where it is exposed

  • Microsoft Fabric can bundle a competing stack through Azure billing in ways a neutral vendor cannot[23].
  • Snowflake's convergence (Iceberg, Polaris, Cortex) erodes Databricks' differentiation[22].
  • Dependence on AWS/Azure/GCP means its suppliers are also its best-capitalized competitors[15].
Strategy & Moats

Open by strategy — but how durable is the moat?

Databricks' stated strategy of open formats and multicloud neutrality, the advantages it actually creates, and the case that those advantages are commoditizing.

Databricks' moat rests on three things: the Spark/Delta ecosystem and its installed base, multicloud neutrality (plus a unique first-party Azure channel), and an early AI platform position. The live debate is whether open table formats — its own strategy — are eroding the first pillar faster than the AI pillar can compound[33].

Stated strategy vs. revealed strategy

What Databricks says: bet on open standards (Delta Lake, MLflow, Unity Catalog all open-sourced), run everywhere (AWS/Azure/GCP), and make data the foundation for AI[28]. What it does is mostly consistent — but the 2024 acquisition of Tabular, the company founded by Apache Iceberg's creators, is the tell: rather than win the format war outright, Databricks bought both sides of it and pledged interoperability via Delta UniForm[29]. It has also locked in distribution (first-party Azure since 2018[30]) and frontier-model supply (five-year Anthropic deal[31]; a $100M OpenAI partnership putting GPT-5 in Agent Bricks[32]).

We're thrilled to bring together the foremost leaders in open data lakehouse formats to make UniForm the best way to unify your data for every workload.
Ali Ghodsi · Co-founder & CEO, Databricks · Jun 2024 · source

The moat-erosion case

The skeptics are pointed. Some analysts argue Apache Iceberg has won new adoptions and that Delta Lake, while open source, is "really a Databricks product" — i.e. governed by one vendor, unlike the more neutral Iceberg[33]. Others note both Databricks and Snowflake are "partly closed and partly open," and that Iceberg support inside Databricks is still less robust than Delta — a gap in the open-neutrality story[34]. If formats are interchangeable and the contest moves to the catalog layer, the storage-format moat that defined Databricks weakens.

SWOT

+ Strengths

  • Growth is accelerating at scale — ~50% → ~55% → >65% YoY past a $5.4B run-rate — with >140% net retention and positive free cash flow.[4][41]
  • Open Spark/Delta ecosystem and an installed base across thousands of enterprises; named a Gartner Leader in both Cloud DBMS and DSML in 2025.[24][52]
  • Multicloud neutrality plus a unique first-party Azure distribution channel with Microsoft since 2018.[30]
  • Frontier-AI partnerships (Anthropic, OpenAI) and Mosaic AI give a credible enterprise GenAI story; AI products already a $1.4B run-rate.[31][32][4]

Weaknesses

  • Gross margin appears to be compressing (~85% → ~80%, with AI products cited even lower) versus ~88% for pure-SaaS peers, as GPU/inference costs rise.[13][18]
  • Consumption pricing draws recurring complaints about unpredictable bills and the separate cloud-infrastructure charge.[17][48]
  • As a private company it discloses no audited GAAP financials; headline numbers are annualized run-rates.[39][40]
  • Roughly $3B+ of acquisitions (MosaicML, Tabular, Neon) brings integration debt and product overlap.[8]

Opportunities

  • AI/agent workloads (Agent Bricks, Genie) plus the Anthropic/OpenAI partnerships expand the addressable AI-platform market.[32][4]
  • Operational databases for AI agents via Lakebase (built on the $1B Neon acquisition) extend Databricks into OLTP.[6][4]
  • Data-warehousing share gains: Databricks SQL has already crossed a $1B run-rate against Snowflake.[3][26]
  • Rising demand for data sovereignty (EU residency, EU AI Act readiness) is a growth vector Databricks is courting.[51]

! Threats

  • Microsoft Fabric and the hyperscalers can bundle competing data+AI platforms in ways a neutral vendor cannot match.[23][26]
  • Open table formats (Iceberg/Delta interoperability) commoditize a former moat and lower switching costs.[33][12]
  • Dependence on AWS/Azure/GCP — both suppliers and rivals — exposes cost and competitive position.[15]
  • An AI-spending pullback would hit demand; CEO Ali Ghodsi himself warns of a broad AI bubble.[47]
  • At >2x Snowflake's revenue multiple, the $134B mark is 'priced for perfection' and vulnerable to any deceleration.[46]
🧭
Read even-handedly: the strengths are real and sourced, but so are the weaknesses and threats. A moat that depends on a format Databricks no longer uniquely controls is a thinner moat than the headline suggests.

The moat is durable

  • Large installed base and developer mindshare from open-sourced Spark/Delta/MLflow[28].
  • Unique first-party Azure distribution and frontier-model partnerships deepen lock-in[30][32].
  • Owning Tabular gives Databricks a foot in both format camps[29].

The moat is commoditizing

  • Iceberg's momentum and Delta's single-vendor perception weaken the open-format claim[33].
  • The contest is moving to the catalog/governance layer, where rivals are open-sourcing too[34].
  • Interoperability cuts both ways — it lowers switching costs out of Databricks as well[12].
Financials & Funding

Three mega-rounds in a year, and a 2.2× valuation in twelve months

The funding history, the revenue trajectory, what's disclosed versus estimated, and the still-deferred IPO.

Databricks raised three major rounds in roughly a year, lifting its mark from $62B (Dec 2024) to $134B (Dec 2025)[36][37]. Revenue is genuinely accelerating at scale and cash flow is positive[4] — but every headline figure is an annualized run-rate from a company that discloses no audited GAAP revenue.

Valuation trajectory

Reported post-money valuations (US$B). These are negotiated private round marks, not a market price; the 2024→2025 climb is steep and recent.

Reported valuation (US$B)
2019202120232024Sep'25Dec'25

Funding history

RoundDateAmountValuation
Series FOct 2019$400M$6.2B[1]
Series GFeb 2021$1.0B$28B[1]
Series HAug 2021$1.6B$38B[1]
Series ISep 2023>$500M$43B[38]
Series JDec 2024$10B$62B[36]
Series KSep 2025$1B>$100B[35]
Series LDec 2025>$4B (≈$5B equity + ~$2B debt)$134B[37]

Pre-2023 round letters/amounts are as reported and vary by source[1]; 2023–2025 rounds are from Databricks' own announcements[38][36][35].

Revenue and profitability

Disclosed run-rates: ~$1.6B revenue for FY2024[39], ~$3B by early 2025, then $4.0B (Sep 2025), $4.8B (Dec 2025) and $5.4B (Feb 2026) — with YoY growth rising from ~50% to ~55% to >65%[4]. Databricks says it has been free-cash-flow positive over the trailing twelve months[35], and subscription gross margins have been cited above 80%[39], though independent estimates show them drifting down as AI mix grows[13].

🧮
Run-rate ≠ GAAP revenue
The $5.4B figure is an annualized run-rate extrapolated from a recent quarter, not audited full-year revenue (FY2024 GAAP-style revenue was $1.6B)[39]. It is a fair growth signal but flatters any "revenue multiple" relative to a public company's trailing figure.

The deferred IPO

Databricks has been an IPO candidate for years and keeps deferring. CEO Ali Ghodsi said in 2024 that "the IPO market is not too open at the moment"[39]; reporting points to a possible S-1 in the second half of 2026, though none had been filed as of spring 2026[40]. The >$7B raised (including ~$2B of debt) plus positive cash flow means there is no liquidity pressure forcing a listing[40].

the IPO market is not too open at the moment
Ali Ghodsi · Co-founder & CEO, Databricks · Mar 2024 · source

The financial bull case

  • Growth accelerating past a $5B run-rate is exceptionally rare and underpins the multiple[41].
  • Positive free cash flow and >140% net retention show real, funded demand[4].
  • Blue-chip, oversubscribed rounds give years of runway and IPO optionality[37].

The financial bear case

  • Headline scale is a run-rate, not audited GAAP revenue[39].
  • Three rounds in a year mean meaningful dilution and a high bar for any IPO[40].
  • Margins are compressing as AI/GPU costs rise[13].
Peer Comparison

Same revenue as Snowflake, more than double the valuation

Databricks benchmarked against the public infrastructure-software companies investors use to price it.

On revenue, Databricks (~$5.4B run-rate) and Snowflake (~$4.5B) are close. On valuation they are not: $134B vs ~$84B, an implied multiple of ~25× versus Snowflake's ~17×[46]. The gap rests almost entirely on growth (>65% vs +29%) — which is the bull case and the bear case at once.

Revenue (latest annualized, US$B)

Latest annualized revenue — Databricks figure is a run-rate
Databricks
$5.4B
Palantir
$4.48B
Snowflake
$4.47B
Datadog
$3.43B
MongoDB
$2.46B

Valuation / market cap (US$B)

Valuation — Databricks is a private round mark; peers are market caps
Palantir
$341B
Databricks
$134B
Datadog
$89B
Snowflake
$84B
MongoDB
$30B

Implied revenue multiple (×)

Valuation ÷ latest annualized revenue
Palantir
76×
Databricks
25×
Datadog
26×
Snowflake
17×
MongoDB
11×

Multiples are computed from the figures above; Databricks uses a run-rate denominator, so its multiple is not directly comparable to peers' trailing-revenue multiples. Palantir trades far richer; MongoDB far cheaper.

Side by side

CompanyStatusRevenueGrowthNet retentionProfitabilityValuation
Databricks[4][13]Private$5.4B run-rate (est.)>65%>140%FCF-positive (TTM); ~80% GM (est.)$134B (Dec 2025 round)
Snowflake[19][20]Public (SNOW)$4.47B FY26 product+29%125%11% non-GAAP op margin; GAAP loss~$84B mkt cap
Palantir[43][42]Public (PLTR)$4.48B (2025)+56%n/dGAAP profitable~$341B mkt cap
Datadog[45]Public (DDOG)$3.43B (2025)+28%n/dProfitable (non-GAAP)~$89B mkt cap
MongoDB[44]Public (MDB)$2.46B FY26+23%n/dNon-GAAP profitable~$30B mkt cap
📊
Snowflake is the truest like-for-like comp; Palantir and Datadog are valuation reference points (different businesses), and MongoDB anchors the low end of infra-software multiples. Confluent left the public set when IBM acquired it in March 2026[27].

Why the premium can be justified

  • Databricks grows roughly 2× faster than Snowflake with higher net retention (>140% vs 125%)[19][4].
  • Its AI run-rate ($1.4B) dwarfs Snowflake's, supporting a richer forward story[4].
  • Public AI comps like Palantir trade at far higher multiples, leaving headroom[42].

Why the premium is risky

  • The ~25× multiple is on a run-rate, so it overstates value versus a trailing figure[46].
  • Snowflake is profitable on a non-GAAP basis and "doesn't go away quietly"[19].
  • Any deceleration toward peer growth rates would compress the gap sharply[46].
Risks, Regulation & Sentiment

Priced for perfection in a market its own CEO calls a bubble

The valuation, execution, competitive and regulatory risks — and the practitioner sentiment, both critical and positive.

The biggest risks are not operational — Databricks executes well — but valuation and cycle: a $134B mark at >2× Snowflake's multiple[46] riding an AI wave that CEO Ali Ghodsi himself calls a bubble[47]. Against that sit a Gartner-Leader platform[52], >140% retention and positive cash flow[4].

Valuation & cycle risk

At ~25× a run-rate (not GAAP) revenue, Databricks is, in one analyst's phrase, "priced for perfection," so any miss on growth or margin "could result in a punishing sell-off"[46]. The macro backdrop is live: a February 2026 software selloff erased up to $1T of value[40]. Most strikingly, Ghodsi has openly warned of a broad AI bubble and "circular" financing — the same demand wave Databricks' AI revenue rides.

Companies that are worth, you know, billions of dollars with zero revenue, that's clearly a bubble, right, and it's, like, insane.
Ali Ghodsi · Co-founder & CEO, Databricks · Dec 2025 · source

Ghodsi frames Databricks as revenue-backed by contrast — but the warning underscores the systemic AI-spend risk to its own demand[47].

Execution & platform risk

Roughly $3B+ of acquisitions (MosaicML, Tabular, Neon) brings "significant integration debt" and product overlap that some customers report as field confusion[8]. Dependence on AWS/Azure/GCP exposes cost and competitive position[15], and open-format commoditization (Iceberg interoperability) lowers the switching costs that protect the base[33].

Practitioner sentiment

Sentiment is genuinely mixed — and the items below are sentiment, not fact. On Gartner Peer Insights the platform rates ~4.5/5, with recurring criticism that cost management and cluster right-sizing are ongoing operational work[50]. On Hacker News, practitioners argue many companies don't need a managed lakehouse and that DBU pricing carries a steep markup — while others defend the value over self-managed Spark[49]. FinOps write-ups echo the bill-predictability complaint[48].

Regulation

No major litigation surfaced in this research. Databricks' regulatory posture is largely a mitigationstory: EU data-residency "Geos," early GDPR deletion tooling, and 14 EU cloud regions with 1,500+ regional staff, positioned to capture rising sovereign-cloud demand rather than facing an open liability[51]. Wider AI-specific regulation (such as the EU AI Act) is an emerging cost the whole sector must absorb.

Scenarios to watch

Three ways the next two years could break — possibilities to weigh, not a prediction this study endorses.

Bull

AI/agent workloads keep compounding, Databricks SQL takes warehouse share from Snowflake, and gross margin stabilizes as scale absorbs GPU cost. A 2026–27 IPO lands above $134B.

Watch: AI run-rate growth, Databricks SQL vs Snowflake, gross-margin trend.

Base

Growth decelerates gradually but stays well above peers; Databricks remains the lakehouse leader while open formats and Fabric cap pricing power. The valuation is digested rather than re-rated sharply.

Watch: Net retention holding >130%, share of customers on 6+ products, IPO timing.

Bear

AI spend cools, hyperscaler bundling and Iceberg commoditization compress price and margin, and growth drops toward 40%. At >2x Snowflake's multiple, the private mark corrects on the way to (or at) IPO.

Watch: AI-budget signals, margin compression, any growth print below ~50%.

🚩
Where this case study may be wrong
Databricks is private: revenue and AI run-rates are company-disclosed annualized figures, gross margin is a third-party estimate (~74–80%), revenue mix is an estimate, and the valuation is a negotiated round mark — none are audited. Sentiment items (Hacker News, Gartner Peer Insights, FinOps blogs) are directional, not fact. Several secondary figures rely on single sources; market-size TAMs diverge widely. This is a point-in-time snapshot as of June 2026 and will go stale as Databricks reports, raises or files to go public.

Mitigants

  • Gartner Leader in both Cloud DBMS and DSML; strong third-party validation[52].
  • >140% retention, positive cash flow and >$7B of capital cushion a downturn[53][40].
  • Revenue-backed AI demand, unlike the zero-revenue names Ghodsi criticizes[47].

Live risks

  • $134B at >2× Snowflake leaves little room for any deceleration[46].
  • AI-spend pullback would hit the fastest-growing, highest-narrative part of revenue[47].
  • Integration debt, margin compression and open-format commoditization all compound[8][33].
How this was made

Methodology & Limitations

What this study is, how it was researched, and — importantly — where it could be wrong.

As of June 2026

Method

Research proceeded by fan-out web search across eight question areas (overview, market, business model, competition, strategy & moats, financials, peer comparison, and risks/regulation/sentiment) and by directly fetching primary and reputable secondary sources — Databricks' own newsroom and research, Microsoft/Snowflake official materials, public-peer figures for Snowflake, Palantir, Datadog and MongoDB, and analysts such as Sacra and named industry figures where private numbers are not disclosed. Every URL cited was opened and read, and an automated link checker validated each one. Claims were transcribed into a structured manifest tagging each source with a tier (19 primary, 23 reputable secondary, 11 soft/sentiment), a confidence level, and a stance (19 supporting, 13 critical, 21 neutral). The load-bearing figures are the company-disclosed revenue run-rate ($4.8B in Dec 2025, $5.4B in Feb 2026), the negotiated valuation marks ($62B → $100B+ → $134B), and the estimated ~74–80% gross margin that drives the margin-compression read.

Frameworks used

The analysis applies Porter's Five Forces to read industry structure, an openness-vs-AI-native positioning map to place Databricks against Snowflake, Fabric, BigQuery, Redshift and Palantir, a unit-economics walk through the DBU/consumption model, an even-handed SWOT, peer benchmarking on revenue, valuation and implied multiple, a bull/base/bear scenario set, and a case-for/case-against ledger in every section so weaknesses and threats get the same scrutiny as strengths. A formal DCF or precise margin model was deliberately skipped because Databricks publishes no audited accounts.

Disclosed vs. estimated

Databricks discloses revenue run-rates (annualized), growth rates, net retention, customer counts and free-cash-flow status in its funding announcements; these are treated as reported, but they are not audited GAAP revenue (FY2024 GAAP-style revenue was $1.6B). Gross margin (~74–80%), the revenue mix (≈60/20/20), and the market-size TAMs are third-party estimates and are labeled as such. Valuations are negotiated round marks, not public market prices. The text flags which bucket each figure falls into wherever it matters.

🚧
Where this case study may be wrong
  • Private-company financials are run-rates/estimates. Databricks publishes no audited accounts; revenue is annualized run-rate, and gross margin and revenue mix are third-party estimates.
  • Valuations are negotiated marks. $134B is a private round price, not a market valuation, and stepped up 2.2× in a year.
  • Run-rate ≠ trailing revenue. Any "revenue multiple" using the $5.4B run-rate is not directly comparable to a public peer's trailing-revenue multiple.
  • Market-size TAMs diverge widely across report vendors and are used only as directional anchors.
  • Sentiment ≠ fact. Hacker News, Gartner Peer Insights and FinOps-blog complaints are representative sentiment, not adjudicated; Databricks' counterpoints are shown alongside.
  • Some early funding details (round letters/amounts pre-2023) vary by source.

Neutrality & independence

This is a compilation, not an argument: it is assembled to let a reader form their own view of Databricks, and each section deliberately pairs the case for with the case against. It is not investment advice and is not affiliated with or endorsed by Databricks. It is a point-in-time artifact dated June 2026; data and AI move quickly, so the figures will age.

🔍
Independent research artifact. Trademarks and figures belong to their owners. Corrections welcome — the value of a study like this is in being checkable.
Bibliography

Sources

Every cited source was fetched during the research run (June 2026). Tiers: 1 = primary/official, 2 = reputable press/filings, 3 = forums/sentiment or soft secondary.

53 sources
Tier 1: 19Tier 2: 23Tier 3: 11·Supporting: 19Critical: 13Neutral: 21

Overview & Timeline

  1. [1]Databricks — Wikipedia T3 neutral
    Databricks was founded in 2013, growing out of the UC Berkeley AMPLab by the original creators of Apache Spark (Ali Ghodsi, Matei Zaharia, Ion Stoica, Reynold Xin, Patrick Wendell, Andy Konwinski and Arsalan Tavakoli-Shiraji).
  2. [2]Databricks is Raising a Series K Investment at a >$100 Billion Valuation T1 supporting
    Databricks announced a Series K at a >$100B valuation, stating that more than 15,000 customers worldwide use its Data Intelligence Platform.
  3. [3]Databricks Grows >55% YoY, Surpasses $4.8B Revenue Run-Rate, and is Raising >$4B Series L at $134B Valuation T1 supporting
    In December 2025 Databricks announced a >$4B Series L at a $134B valuation, disclosing a $4.8B revenue run-rate growing >55% YoY with >$1B AI and >$1B data-warehousing run-rates and positive trailing-12-month free cash flow.
  4. [4]Databricks Grows >65% YoY, Surpasses $5.4 Billion Revenue Run-Rate, Doubles Down on Lakebase and Genie T1 supporting
    By February 2026 Databricks reported a $5.4B revenue run-rate growing >65% YoY, with a $1.4B AI-products run-rate, >140% net revenue retention, >800 customers at $1M+ and >70 at $10M+ run-rate, and positive trailing-12-month free cash flow, at a $134B valuation.
  5. [5]Databricks picks up MosaicML, an OpenAI competitor, for $1.3B T2 neutral
    Databricks acquired MosaicML for $1.3 billion in June 2023, entering generative-AI model training; the unit became Mosaic AI.
  6. [6]Databricks adds Postgres database with $1B Neon acquisition T2 neutral
    Databricks acquired serverless-Postgres startup Neon for about $1 billion in 2025 to power Lakebase for AI agents.
  7. [7]Databricks Launches DBRX, A New Standard for Efficient Open Source Models T1 supporting
    Databricks launched DBRX, an open-source mixture-of-experts LLM, on March 27, 2024, built by its Mosaic AI team.
  8. [8]Databricks at $5.4B: The Architecture of AI Autonomy T3 critical
    Analysts argue the rapid assimilation of MosaicML, Tabular and Neon — roughly $3B+ of M&A — has created 'significant integration debt,' with customers reporting field confusion across overlapping products.

Market & Industry

  1. [9]Cloud Data Warehouse Market Share & Size 2031 Outlook — Mordor Intelligence T2 supporting
    The cloud data warehouse market is forecast to grow from ~$11.8B (2025) to ~$49.1B by 2031 at a 26.86% CAGR, with public-cloud deployments ~64% of 2025 revenue and AWS, Microsoft, Google Cloud and Snowflake ~68% of 2024 vendor revenue.
  2. [10]Big Data Analytics Market Size, Value & Share [2034] — Fortune Business Insights T3 neutral
    The global big-data-analytics market was estimated at ~$394.7B in 2025, projected to reach ~$1,176.6B by 2034 at a 12.8% CAGR.
  3. [11]Data Lakehouse Market Size & Share, Forecast 2025-2034 — Global Market Insights T3 neutral
    Estimates of the narrowly-defined data-lakehouse market vary very widely — from ~$4.75B (2025) growing ~11% CAGR to ~$14.2B (2025) growing ~25% CAGR — reflecting how unsettled the category's boundaries are.
  4. [12]Lakehouse Convergence: Delta Lake & Iceberg — Capital One Tech T2 critical
    Industry observers say storage formats are converging (Delta, Iceberg, Hudi all readable via open catalogs), shifting the competitive boundary up to the catalog/governance layer and reducing format lock-in.
  5. [13]Databricks revenue, valuation & funding — Sacra T2 neutral
    Sacra estimates Databricks reached ~$5.4B annualized revenue by January 2026 (up from $4.0B in Q2 2025 and $4.8B in Q3 2025), grows roughly twice as fast as Snowflake at comparable scale, and that its gross margin was ~80% in mid-2024, down from ~85% a year earlier.

Business Model

  1. [14]Databricks Pricing T1 neutral
    Databricks sells on pay-as-you-go consumption pricing billed per second, with Committed Use Contracts giving volume discounts that flex across clouds.
  2. [15]Databricks pricing guide (2026): Understanding DBU costs — Flexera T2 neutral
    A DBU (Databricks Unit) is a normalized unit of compute consumed per hour; customers pay Databricks for DBUs and the cloud provider separately for infrastructure, which 'routinely account[s] for 50 to 70 percent of total Databricks spend.' Premium-tier AWS rates run from $0.15/DBU (Jobs) to $0.70/DBU (SQL Serverless).
  3. [16]Databricks at $4.8B ARR — Sacra T2 supporting
    Sacra estimates the ~$4.8B revenue mix (late 2025) as ~60% core data platform, ~20% data warehousing (Databricks SQL) and ~20% AI/Mosaic AI, with Databricks growing roughly 2x faster than Snowflake and trading at ~28x revenue versus Snowflake's ~20x.
  4. [17]How Databricks Pricing Works: A 2026 Cost Breakdown — CloudZero T3 critical
    Critics note the consumption model produces unpredictable bills: teams underestimate total cost by 50-200% when they budget only DBUs and overlook the separate cloud-infrastructure bill, and interactive All-Purpose compute costs 2-3x more per DBU than Jobs Compute.
  5. [18]Databricks Raises $5B, But AI Products Squeeze Margins — byteiota T3 critical
    Independent analysis argues AI products squeeze Databricks' margins: gross margin reportedly slipped toward ~74% (vs ~88% for pure-SaaS peers) because real-time inference needs dedicated 24/7 GPUs and inference is most of production AI compute cost.

Competitive Landscape

  1. [19]Snowflake Q4 FY2026 slides: 30% revenue growth, margin expansion ahead — Investing.com T2 neutral
    Snowflake (NYSE: SNOW) reported FY2026 (ended Jan 31, 2026) product revenue of $4.472B (+29% YoY), Q4 product revenue $1.227B (+30%), net revenue retention 125%, RPO $9.772B (+42%), non-GAAP operating margin 11%, with FY2027 product-revenue guidance of $5.66B (+27%).
  2. [20]Snowflake (SNOW) Market Cap — StockAnalysis T2 neutral
    Snowflake's market capitalization was about $83.6B as of June 3, 2026.
  3. [21]Polaris Catalog Is Now Open Source — Snowflake T1 neutral
    Snowflake open-sourced its Polaris Catalog under Apache 2.0, implementing Apache Iceberg's REST catalog spec with interoperability across Spark, Flink, Trino, Dremio, Confluent, dbt, Google Cloud and Microsoft.
  4. [22]AI agents, open data and governance take center stage at Snowflake Summit — SiliconANGLE T2 neutral
    At Snowflake Summit (June 2026) Snowflake pushed an 'agentic enterprise' vision with Cortex AI, Iceberg v3, a Horizon Catalog integrating Apache Polaris with bidirectional interoperability, and a Kafka-compatible Datastream service — converging on Databricks' turf.
  5. [23]70% of the Fortune 500 already use Microsoft Fabric — VentureBeat T2 critical
    Microsoft markets Fabric in direct competition with Snowflake and Databricks, reporting 70% of the Fortune 500 (and 21,000+ organizations) as Fabric customers — even as Azure Databricks remains a first-party Azure service (a 'frenemy' dynamic).
  6. [24]Databricks Named a Leader in 2025 Gartner Magic Quadrant for Cloud DBMS — Databricks T1 supporting
    Databricks was named a Leader in the 2025 Gartner Magic Quadrant for Cloud Database Management Systems for the fifth consecutive year, scored for the first time on operational/OLTP criteria (via Lakebase).
  7. [25]A Leader in 2025 Gartner Magic Quadrant for CDBMS — Google Cloud T1 neutral
    The 2025 Gartner Magic Quadrant for Cloud DBMS Leaders quadrant includes AWS, Google, Microsoft, Oracle, Databricks, MongoDB, Snowflake, Alibaba Cloud and IBM — an unusually crowded leaders field.
  8. [26]Databricks adds $4B funding round; IPO could be next — TechTarget T2 critical
    Industry analyst Sanjeev Mohan says Databricks' $1B+ data-warehousing revenue lets it compete head-to-head with Snowflake, but that it 'needs a compelling answer for why enterprises should choose them over Microsoft Fabric.'
  9. [27]IBM Completes Acquisition of Confluent — IBM Newsroom T2 neutral
    IBM completed its ~$11B all-cash acquisition of Confluent ($31/share) on March 17, 2026, removing a public streaming-data comp from the market; Confluent's 2025 subscription revenue was ~$1.12B (+21%).

Strategy & Moats

  1. [28]Lakehouse: A New Generation of Open Platforms — Databricks Research T1 supporting
    Databricks formalized the 'lakehouse' architecture in its 2021 CIDR paper, arguing for one platform built on open, direct-access formats (Apache Parquet) that unifies data warehousing, data lakes and ML.
  2. [29]Databricks Agrees to Acquire Tabular, Founded by the Original Creators of Apache Iceberg — Databricks T1 supporting
    Databricks acquired Tabular — founded by Apache Iceberg's original creators — in June 2024, pledging to unify the leading open table formats (Delta, Iceberg, Hudi) via Delta UniForm and reduce format lock-in.
  3. [30]Databricks and Microsoft Extend Strategic Partnership for Azure Databricks — Databricks T1 supporting
    Azure Databricks has been a Microsoft first-party service since 2018; the partnership was extended multi-year in June 2025 with deeper Azure AI Foundry, Power Platform and SAP integrations — a distinctive hyperscaler distribution channel.
  4. [31]Databricks and Anthropic Sign Landmark Deal to Bring Claude Models to the Data Intelligence Platform — Databricks T1 supporting
    Databricks signed a five-year deal with Anthropic in March 2025 to make Claude models native across AWS, Azure and GCP, citing reach to over 10,000 companies.
  5. [32]Databricks and OpenAI Launch Partnership to Bring Frontier Intelligence to Enterprises with Agent Bricks — Databricks T1 supporting
    Databricks and OpenAI announced a $100M partnership in September 2025 making OpenAI models (including GPT-5) native in Databricks' Agent Bricks for 20,000+ customers.
  6. [33]CIOs are (still) closer than ever to their dream data lakehouse: Apache Iceberg has won — CIO T2 critical
    Some analysts argue Apache Iceberg has effectively won the open-table-format war and that Delta Lake, though open source, is 'really a Databricks product' — weakening Databricks' open-format neutrality claim.
  7. [34]Big-data dust-up: Why two AI giants are at war over who's more open — SiliconANGLE T2 critical
    Analysts say both Databricks and Snowflake are 'partly closed and partly open,' and that Databricks' 2024 open-sourcing of Unity Catalog was a thin release, while a Forrester analyst notes Iceberg support inside Databricks remains less robust than Delta Lake.

Financials & Funding

  1. [35]Databricks Surpasses $4B Revenue Run-Rate, Exceeding $1B AI Revenue Run-Rate — Databricks T1 supporting
    In September 2025 Databricks announced a Series K at >$100B valuation, disclosing a $4B revenue run-rate (>50% YoY), a >$1B AI run-rate and positive trailing-12-month free cash flow.
  2. [36]Databricks is Raising $10B Series J Investment at $62B Valuation — Databricks T1 supporting
    In December 2024 Databricks announced a $10B Series J at a $62B valuation, led by Thrive Capital with co-leads including a16z, DST Global, GIC, Insight Partners and WCM, growing >60% YoY.
  3. [37]Databricks raises $4B at $134B valuation as its AI business heats up — TechCrunch T2 neutral
    The December 2025 Series L of >$4B at $134B marked Databricks' third major venture round in roughly a year — a +34% step-up from the >$100B September mark and +123% from the $62B December-2024 round.
  4. [38]Databricks Raises Series I Investment at $43B Valuation — Databricks T1 supporting
    In September 2023 Databricks raised >$500M (at $73.50/share) at a $43B valuation, led by T. Rowe Price with new investors including Capital One Ventures, Ontario Teachers' and NVIDIA.
  5. [39]Databricks keeps marching forward with $1.6B in revenue — TechCrunch T2 neutral
    Databricks reported $1.6B revenue for the fiscal year ended Jan 2024 (>50% YoY) with subscription gross margins >80% and ~140% net expansion; CEO Ali Ghodsi said in March 2024 that 'the IPO market is not too open at the moment.'
  6. [40]Databricks IPO 2026: Valuation, Date & Investor Guide — Allied Venture Partners T3 critical
    As of April 2026 Databricks had filed no S-1, with an H2-2026 IPO seen as increasingly likely; a Feb-2026 SaaS selloff (the S&P software index fell ~13%) and the September-2025 departure of AI chief Naveen Rao are cited as risks, and the >$7B raised (incl. ~$2B debt) reduces pressure to list.

Peer Comparison

  1. [41]Databricks at $5.4B, Growing 65%. Is It the Most Unstoppable Company in B2B? — SaaStr T2 supporting
    Analysts note Databricks' growth is accelerating at scale — from ~50% to ~55% to >65% YoY past a $5B run-rate — which 'basically never happens,' alongside >140% NRR, positive FCF and >80% gross margins.
  2. [42]Palantir Technologies (PLTR) Market Cap — StockAnalysis T2 neutral
    Palantir (NASDAQ: PLTR) had a market capitalization of about $340.9B as of June 3, 2026 — a far richer public AI valuation than Snowflake.
  3. [43]Palantir Technologies (PLTR) Revenue 2018-2025 — StockAnalysis T2 neutral
    Palantir's 2025 revenue was $4.48B (+56% YoY), with Q4 2025 revenue of $1.41B (+70%).
  4. [44]MongoDB (MDB) Stock Overview — StockAnalysis T2 neutral
    MongoDB (NASDAQ: MDB) had a market cap of ~$29.6B as of June 2026 on ~$2.6B TTM revenue (~+24%); FY2026 revenue was $2.46B (+23%).
  5. [45]Datadog (DDOG) Stock Overview — StockAnalysis T2 neutral
    Datadog (NASDAQ: DDOG) had a market cap of ~$89.1B as of June 3, 2026 on ~$3.67B TTM revenue (~+30%); full-year 2025 revenue was $3.43B (+28%).
  6. [46]Databricks IPO 2026: $134B Valuation, $5.4B Revenue [Analysis] — Tech-Insider T3 critical
    At $134B on ~$5.4B of run-rate revenue, Databricks is valued at over 2x publicly-traded Snowflake on similar revenue; observers caution it is 'priced for perfection,' so any deceleration could trigger a sharp re-rating.

Risks, Regulation & Sentiment

  1. [47]Databricks CEO Ali Ghodsi: 'That's clearly a bubble' — Fortune T2 critical
    Databricks CEO Ali Ghodsi publicly called large parts of the AI market a bubble — pointing to billion-dollar zero-revenue companies and circular financing — and predicted conditions will get 'much, much, much worse' before correcting, underscoring systemic AI-spend risk to Databricks' own demand.
  2. [48]Databricks Pricing Guide 2026: Costs & Plans Broken Down — Mammoth T3 critical
    SENTIMENT (practitioner): vendor and FinOps write-ups say Databricks' dual billing structure (DBUs plus a separate cloud bill) 'catches almost everyone off guard,' and idle interactive clusters can waste a large share of spend.
  3. [49]Most companies do not need Snowflake or Databricks — Hacker News T3 critical
    SENTIMENT (Hacker News): in a widely-read thread, practitioners argue many companies don't need Databricks and that DBU pricing carries a steep markup over raw compute, while others defend its managed value over self-maintained Spark.
  4. [50]Databricks Reviews, Ratings & Features — Gartner Peer Insights T3 neutral
    SENTIMENT (Gartner Peer Insights): the Databricks platform rates ~4.5/5 across hundreds of DSML reviews; recurring criticisms cluster on constant platform change and on cost management/cluster-rightsizing being ongoing operational work.
  5. [51]Databricks' Commitment to European Sovereignty and Growth — Databricks T1 supporting
    Mitigation: Databricks positions for rising data-sovereignty demand with EU data-residency 'Geos,' early GDPR deletion tooling, and 14 EU cloud regions plus 1,500+ regional staff — framing compliance as a managed capability rather than an open liability.
  6. [52]Databricks Named a Leader in the 2025 Gartner Magic Quadrant for DSML Platforms — Databricks T1 supporting
    Counterweight: Databricks was named a Leader in the 2025 Gartner Magic Quadrant for Data Science and Machine Learning Platforms, positioned highest in Ability to Execute and furthest in Completeness of Vision.
  7. [53]Databricks Grows >65% YoY, Surpasses $5.4 Billion Revenue Run-Rate — Databricks T1 supporting
    Counterweight: Databricks reports >140% net revenue retention, positive trailing-12-month free cash flow, and >800 customers at $1M+ run-rate — evidence of funded, expanding demand behind the valuation.

Cross-checked at build time by an automated link checker; a few primary sources may be paywalled or bot-walled and were verified manually. See Methodology & Limits.