Lakehouse Patterns on ��ҹѰ�� with Databricks

Blog

Lakehouse Patterns on ��ҹѰ�� with Databricks

May 26, 2026

Bryan Berezdivin

��ҹѰ�� and Databricks have completed a joint integration that lets enterprise lakehouses span an organization’s entire data estate. Wherever the data lives, on-premises, at the edge, or in any major cloud, Databricks can now read, write, and govern it through ��ҹѰ�� storage.

The lakehouse has become the foundation of modern data analytics by combining the openness of a data lake with the governance and reliability of a data warehouse. Organizations adopt it to escape vendor lock-in, lower costs, and unify their analytics and AI on a single platform. But until now, lakehouses have largely been confined to a single site or cloud provider’s object storage in a single region. This has served the industry well, but most enterprises have their raw and bronze level data everywhere: factory sensors, branch-office data, medical imagery, application logs, transactional records, all generated and retained across data centers, edge sites, and multiple clouds. IDC projects enterprise unstructured data growing at approximately 16% CAGR through 2028 to 10.5 ZB, fueled by sensor proliferation, IoT, and AI workloads. Bringing the lakehouse to all of that data, wherever it lives, is the next architectural step.

��ҹѰ�� is a software-defined data platform that runs across on-premises, edge, and cloud as a single global namespace. Together with Databricks, it enables the lakehouse to extend across the full data estate without requiring data to be copied or consolidated into one cloud bucket. The same governed tables can be queried by Databricks in one region, by a training job on-premises, and by a BI tool in another cloud, against one source of truth.

This post introduces three validated integration patterns between Databricks and ��ҹѰ�� to enable different lakehouse architectures: (1) run Databricks analytics and AI directly against data on ��ҹѰ��, with no re-platforming or migration; (2) bring ��ҹѰ��-resident tables under Unity Catalog governance, for a single governed view regardless of where the data lives; and (3) share ��ҹѰ�� data, read-only, with other Databricks workspaces, other clouds, and non-Databricks tools through OpenSharing. For step-by-step deployment guidance, see the

Figure 1. High Level Architecture of ��ҹѰ��-Databricks integration patterns

Benefits for Databricks teams

These patterns deliver three outcomes for organizations adopting Databricks:

Faster time to results. Existing data on ��ҹѰ�� from raw logs, imagery, telemetry, genomics, and application records is read by Databricks in place, eliminating bulk migration costs and time and per-request S3 API charges. Under validation test loads, ��ҹѰ�� saw API-related storage costs reduced by 60% or more and time-to-first-results compressed by 40% or more compared to equivalent workflows that staged data through cloud object storage first.
One copy, many consumers. The same data is used concurrently by Databricks in the cloud, by training jobs on-premises, by edge applications, and by other analytics and AI tools. Everyone works against a single source of truth instead of versions drifting apart across environments.
Unified governance without bulk migration. Tables on ��ҹѰ�� are governed through Unity Catalog, with permissions, audit, and lineage applied consistently to analysts and BI teams across notebooks and dashboards.

Three validated patterns

Each pattern positions the data and the governance differently, and most production deployments combine them. It should be noted that customers can deploy a combination of these integrations.

Pattern A. ��ҹѰ�� holds raw and historical data, Databricks holds the curated tables. Databricks compute reads source data on ��ҹѰ��, applies Silver and Gold transformations, and writes the curated Delta tables into Unity Catalog managed storage. Serverless SQL Warehouses query the Gold tables for BI and analytics. Best when an organization is starting with Databricks and wants its first curated layer governed natively by Unity Catalog.

Pattern B. All medallion tiers (Bronze, Silver, Gold) live on ��ҹѰ�� as Delta tables. The tables are registered in a Hive Metastore that Databricks federates into Unity Catalog. All-Purpose Compute reads and writes through this path; Serverless SQL Warehouses query through Unity Catalog. Best when an organization wants its full lakehouse to remain on ��ҹѰ�� while still benefiting from Unity Catalog governance, lineage, and audit.

Pattern C: With OpenSharing, ��ҹѰ�� can expose Delta tables read-only to Unity Catalog and other consumers. Databricks and other tools get short-lived access via the OpenSharing protocol, with no data copied to the consumer. This is best for sharing ��ҹѰ�� data across multiple consumers (other Databricks workspaces, Snowflake, BI tools) and across multiple clouds.

Step-by-step procedures for all three patterns are in the