Database vs Data Warehouse vs Data Lake 2026

Q: What is the difference between OLTP and OLAP, and how do I know which I need?

OLTP (online transactional processing) is the workload profile of an operational database — thousands of short-lived, highly concurrent transactions that read and write the current state of the business. Sub-50ms latency, full ACID semantics, narrow indexed lookups, and row-by-row updates dominate. OLAP (online analytical processing) is the workload profile of a data warehouse or lakehouse — long-running aggregations across billions of rows of history, sorting on conformed dimensions, with hundreds of concurrent analysts but no individual-row updates. The two profiles fight each other inside a single engine: running heavy aggregations against an OLTP database starves the transactional workload, and running thousands of point lookups against an OLAP warehouse wastes its columnstore advantage. The rule is simple: capture in an OLTP system (Azure SQL DB, Cosmos DB, Azure SQL MI, PostgreSQL Hyperscale); report and analyze in an OLAP system (Fabric Warehouse, Fabric Lakehouse, Synapse Dedicated SQL Pool); move data between them with Fabric Mirroring, Synapse Link, or Data Factory. Most enterprises need both, separated by clear architectural boundaries.

Q: When does Cosmos DB beat Azure SQL Database, and when is it the wrong choice?

Cosmos DB beats Azure SQL Database when the workload is globally distributed (more than one Azure region serving live traffic), requires single-digit-millisecond P99 latency across continents, is multi-model (document, graph, key-value, or column-family in one engine), or scales beyond the practical envelope of a relational engine (think gaming back-ends, ad-tech bidding, IoT command-and-control, or session stores at hundreds of thousands of operations per second). Cosmos is also the right answer when the data model is genuinely schema-flexible — heterogeneous documents that do not fit comfortably into a relational schema. Cosmos is the wrong choice when the workload fits inside a single Azure region with relational semantics, when SQL Server compatibility (cross-database queries, SQL Agent, SSIS) matters, when transactional history and complex joins dominate, or when the team has no operational experience tuning partition keys and RU/s. We have rewritten Cosmos workloads back onto Azure SQL three times in the last 24 months because the operational complexity outweighed the scale benefit. The honest 2026 default for new enterprise OLTP is Azure SQL DB or Azure SQL MI; Cosmos is the right choice when one of the four conditions above is genuinely true.

Q: What is the difference between a lakehouse and a data warehouse — and which do I need in 2026?

A data warehouse (Fabric Warehouse, Synapse Dedicated SQL Pool, traditional SQL Server with columnstore) is a modeled analytical engine — schema-on-write, conformed dimensions, governed metric definitions, ACID transactions on relational tables, optimized for high-concurrency aggregation. A lakehouse (Fabric Lakehouse on OneLake, Databricks Lakehouse with Unity Catalog) is a SQL surface layered over Delta Lake (Parquet + transaction log) — schema is enforced by Delta with evolution support, ACID transactions are guaranteed by the Delta protocol, multi-modal data (Spark, SQL, Python, R, ML) shares one storage substrate. The 2026 honest read: a lakehouse can do most of what a warehouse does, plus ML and unstructured data, at object-storage cost — which is why Microsoft is pushing Fabric Lakehouse as the architectural default for net-new platforms. A modeled warehouse still wins for highly regulated workloads where regulator-readable lineage and fine-grained metric governance dominate, for departmental BI workloads under ~1 TB where the lakehouse adds operational complexity without benefit, and for sub-second point-lookup workloads on small dimension tables where the warehouse columnstore + b-tree pattern outperforms Delta Parquet. Most Fortune 500 platforms we build today run both: Lakehouse for raw-through-curated and ML, Warehouse for board-grade conformed marts, with both feeding Power BI through Direct Lake.

Q: Can a Fabric Lakehouse really replace my enterprise data warehouse?

For most net-new enterprise data platforms in 2026 — yes. Fabric Lakehouse on OneLake offers ACID transactions via Delta Lake, RLS / CLS on the SQL endpoint, V-Order Parquet that lets Power BI Direct Lake read at Import-mode performance with zero refresh, native Purview lineage end-to-end, and the same Entra identity surface that governs the rest of the Microsoft estate. We have built lakehouses that retired traditional warehouses for healthcare, manufacturing, and professional services Fortune 500 clients. But the honest qualifier: lakehouse replacement of warehouse is not automatic. It requires (1) a disciplined medallion architecture (Bronze / Silver / Gold) with workspace boundaries, (2) certified semantic models on the Gold zone for board reporting, (3) Fabric capacity sized to your concurrency curve — sustained high-concurrency analytical workloads can require larger F-SKUs than the equivalent Synapse Dedicated SQL Pool DWU sizing, (4) Purview governance configured from day one — not retrofitted six months in, (5) clear cost-tagging because Fabric capacity is shared across workloads. Where these conditions are not met, the Warehouse + Lakehouse hybrid pattern is the lower-risk answer for another 12–24 months. EPC Group always runs a Fabric Unit of Capacity (FUC) sizing exercise before recommending lakehouse-only architectures for high-concurrency workloads.

Q: What is the difference between OneLake and ADLS Gen2?

ADLS Gen2 (Azure Data Lake Storage Gen2) is the hierarchical-namespace object storage layer that has powered Azure data platforms for years — POSIX-style ACLs, container + folder + file hierarchy, addressable by storage account URL, priced per GB stored. It is generic, mature, and tied to Azure storage account boundaries. OneLake is the Fabric-tenant-wide unified data lake — a single logical lake that spans all Fabric workspaces in your Microsoft tenant, with workspace and item permissions enforced by Microsoft Entra, with shortcuts that can virtualize external storage (ADLS Gen2, Amazon S3, Google Cloud Storage, Dataverse) without data copy, and with built-in support for Delta Parquet with V-Order optimization for Power BI Direct Lake. OneLake is not a replacement for ADLS Gen2 — under the hood, OneLake storage is itself built on Azure Storage. The practical difference for an enterprise: with ADLS Gen2 you manage storage accounts, containers, and ACLs as primary objects; with OneLake you manage Fabric workspaces, items, and shortcuts as primary objects, and storage management becomes implicit. For existing ADLS Gen2 lakes, OneLake shortcuts allow you to expose them inside Fabric without migration. For net-new platforms in 2026, OneLake is the right starting point.

Q: When do I use Power BI Import vs DirectQuery vs Direct Lake?

Import is the original Power BI consumption mode — data is loaded into the VertiPaq columnar in-memory engine on a scheduled refresh (typically 1×/day to 8×/day on Premium). Best performance, deterministic latency, full DAX coverage, and decoupled from source system load — but data is only as fresh as the last refresh, and very large semantic models can exceed Premium capacity memory limits. DirectQuery sends every Power BI report query as a live SQL query to the underlying source (Azure SQL, Synapse, Fabric Warehouse). Always-current data, no semantic model size limit, but performance depends on source throughput and source-side query load can become a bottleneck. Direct Lake is the 2026 Fabric-only mode that reads V-Order Parquet directly from OneLake into the VertiPaq engine — Import-mode performance with zero refresh, governed by Fabric capacity, requires Fabric Lakehouse or Fabric Warehouse as the source. The 2026 default for Fabric-anchored workloads is Direct Lake. Import remains correct when the source is non-Fabric (Azure SQL DB, Snowflake, Salesforce) and refresh latency is acceptable, or when complex M-language transformations are required. DirectQuery is correct when freshness must be real-time and the source can absorb the query load — generally a niche use case in 2026.

Q: Is Fabric Warehouse the same as Synapse Dedicated SQL Pool — and should I migrate?

They are architecturally distinct. Synapse Dedicated SQL Pool is the renamed Azure SQL Data Warehouse — a massively parallel processing (MPP) SQL warehouse with provisioned compute measured in Data Warehouse Units (DWU), running on the original Polybase-derived MPP engine. Fabric Warehouse is the 2026 Microsoft analytical warehouse, built on the same Polaris distributed query engine but storing data as Delta Parquet on OneLake (not in a proprietary internal format), priced through unified Fabric capacity (not DWU-hour), and natively integrated with Direct Lake to Power BI. Functionally, Fabric Warehouse delivers T-SQL parity for most workloads, with some Synapse-specific features (resource classes, MPP-style workload management isolation, certain hint syntaxes) absent or different. Microsoft has not announced an EOL for Synapse Dedicated SQL Pool as of June 2026, but net-new investment is concentrated on Fabric. Migration triggers: Power BI Direct Lake is the strongest, OneLake mirroring of operational sources is second, Copilot in Fabric is third. Stay on Synapse Dedicated SQL Pool when the workload is tightly tuned at high concurrency, the chargeback model is built on DWU sizing, or sovereign-region Fabric GA is not yet available. We cover the full Synapse → Fabric migration framework in our Azure Synapse Analytics Enterprise Guide.

Q: How do data lakes fail without governance — and how do we avoid the swamp?

Ungoverned data lakes fail in five recognizable ways. First, ingestion outpaces curation — terabytes land in raw zones and never get modeled, lineage breaks, and within a year nobody can tell which datasets are trustworthy. Second, schema evolves silently — a source system changes a field type, downstream Spark jobs and SQL queries break in unpredictable ways, and incident response becomes archaeology. Third, security is by obscurity — POSIX ACLs are applied inconsistently, sensitive data appears in the wrong containers, and HIPAA / PII exposure becomes an audit finding. Fourth, cost discipline collapses — pay-per-TB-scanned queries run repeatedly against poorly partitioned CSV files, costs spike, and FinOps cannot attribute spend. Fifth, BI consumers route around the lake — they extract data they need into Excel or rogue Power BI tenants, and the lake becomes a write-only system. The avoidance pattern is explicit and non-optional: medallion architecture (Bronze / Silver / Gold) with workspace boundaries, Delta Parquet for everything beyond Bronze, Purview catalog and sensitivity labels applied at ingestion, certified semantic models for BI consumption, Fabric capacity discipline, and senior-architect-led governance with quarterly steward reviews. EPC Group delivers this under our data governance engagement model.

What is the difference between a database, a data warehouse, and a data lake — and which does my enterprise need?

A database (Azure SQL DB, Cosmos DB, PostgreSQL Hyperscale) captures the operational state of the business — OLTP, ACID, sub-50ms latency. A data warehouse (Microsoft Fabric Warehouse, Synapse Dedicated SQL Pool) models history for governed analytics — OLAP, conformed dimensions, board-grade reporting. A data lake (Azure Data Lake Storage Gen2, OneLake) holds raw and unstructured data at any scale for ML, data science, and IoT. A lakehouse (Microsoft Fabric Lakehouse on OneLake) is the 2026 architectural convergence — Delta Parquet over object storage gives you warehouse-grade ACID + SQL + RLS over lake-grade economics + ML + multi-modal. Most Fortune 500 enterprises need at least three: a database for capture, a lakehouse for raw-through-curated and ML, and either a Fabric Warehouse or the Gold zone of the lakehouse for board-grade governed analytics. The honest decision is which workload lives where — not which platform "wins."

In 2026, Microsoft's data architecture defaults are: Azure SQL Database (or Cosmos DB) for OLTP capture, Microsoft Fabric Lakehouse on OneLake for raw-through-curated analytics and ML, and Microsoft Fabric Warehouse for board-grade conformed marts. The lakehouse pattern is converging warehouse and lake for net-new platforms, but a modeled warehouse still wins for regulator-readable lineage and high-concurrency conformed reporting. EPC Group has delivered this architecture for 70+ Fortune 500 clients under HIPAA, SOC 2, FedRAMP, FINRA, CMMC, and GxP regulatory baselines.

Key Facts

Database (OLTP) examples: Azure SQL Database, Azure SQL Managed Instance, Cosmos DB, Azure Database for PostgreSQL Flexible Server / Hyperscale (Citus)
Data warehouse (OLAP) examples: Microsoft Fabric Warehouse, Synapse Dedicated SQL Pool, Azure SQL DB with columnstore for departmental scale
Data lake examples: Azure Data Lake Storage Gen2 (ADLS Gen2), Microsoft OneLake — Fabric-tenant-wide unified lake with V-Order Delta Parquet
Lakehouse examples: Microsoft Fabric Lakehouse on OneLake, Databricks Lakehouse on Unity Catalog — Delta or Iceberg over object storage with ACID + SQL + ML
EPC Group: 216+ M&A tenant consolidations, 1,500+ Power BI implementations, 500+ Microsoft Fabric platforms across 70+ Fortune 500 clients
Microsoft Solutions Partner — Data & AI Designation plus five other current Solutions Partner Designations; senior-architect-led, fixed-fee delivery
Compliance-native baselines for HIPAA, SOC 2, FedRAMP, FINRA, CMMC, GxP regulated workloads across Azure Commercial and Azure Government
Engagement cadence: Data Architecture Assessment (2-week fixed fee) → Modernization Accelerator (90-day fixed fee) → Managed Fabric + Lakehouse (monthly retainer)

What These Four Things Actually Are

Half of the Microsoft data architecture confusion we encounter in CIO and CDO engagements is vocabulary — the words "database," "warehouse," "lake," and "lakehouse" are used interchangeably in vendor marketing, in trade press, and inside enterprise architecture teams. They are not the same. They are four distinct architectural surfaces with different workload profiles, different governance properties, and different cost models. Below is the working definition for each — the way EPC Group's senior architects use the terms in every Fabric and Synapse engagement we ship.

Database (OLTP)

Operational system of record optimized for high-concurrency reads and writes against current state.

A database in the enterprise sense is an OLTP (online transactional processing) engine — it captures business events as they happen. Sub-50ms read and write latency, ACID transactional guarantees, row-store or document-store layout, narrow indexes tuned for point lookups, and concurrency control built for thousands of simultaneous short-lived transactions. It holds the current state of the business — the customer record, the order line, the inventory count — not its history. Reporting is a secondary use case at best, and analytical queries that scan millions of rows will starve the transactional workload they share the engine with.

Microsoft mapping: Azure SQL Database, Azure SQL Managed Instance, Azure Database for PostgreSQL (Flexible Server and Hyperscale/Citus), Azure Cosmos DB, Azure SQL on IaaS VMs.

Data Warehouse (OLAP)

Modeled, governed analytical store optimized for low-latency aggregation across billions of rows.

A data warehouse is an OLAP (online analytical processing) engine — it answers analytical questions across history. Columnstore compression, massively parallel processing (MPP), star and snowflake schemas with conformed dimensions, write-once read-many workload profile, and concurrency tuned for hundreds-to-thousands of analysts running aggregation queries. Schema is defined on write — every column is typed, every relationship is modeled, every grain is conformed. The warehouse is the place where the business definition of "revenue", "customer", and "active user" lives in semantic stone, governed by a finance or data team.

Microsoft mapping: Microsoft Fabric Warehouse, Synapse Dedicated SQL Pool, Azure SQL DB with columnstore indexes for departmental warehousing.

Data Lake

Schema-on-read object storage that holds raw, semi-structured, and unstructured data at any scale.

A data lake is object storage — Parquet, CSV, JSON, Avro, ORC, images, audio, video, telemetry — addressed by file path or container URL. There is no engine intrinsic to a pure lake; the engine is whatever query layer (Spark, Synapse Serverless SQL, Fabric Lakehouse SQL Endpoint, Trino, Presto, Databricks SQL) you point at it. Schema is enforced at read time, which makes the lake cheap to ingest into and expensive to govern. Lakes hold data that you want to keep but do not yet know how to model — clickstream, IoT telemetry, application logs, ML training corpora, raw SaaS exports — and they hold data at petabyte scale at object-storage prices.

Microsoft mapping: Azure Data Lake Storage Gen2 (ADLS Gen2), Microsoft OneLake (the unified Fabric tenant lake), object storage backing both Fabric Lakehouse and Synapse Spark.

Lakehouse

A governed warehouse-like SQL surface layered directly over Delta + Parquet files in the lake.

A lakehouse is the architectural convergence of warehouse and lake — Delta Lake or Apache Iceberg layered over Parquet object storage, giving you ACID transactions, time travel, schema evolution, RLS / CLS, and a SQL query surface that behaves like a warehouse but writes to the lake. The 2026 Microsoft expression is Fabric Lakehouse on OneLake. The lakehouse is genuinely useful — it cuts duplication between lake and warehouse, lets Power BI Direct Lake read V-Order Parquet at near-warehouse latency, and enables Spark, SQL, Python, R, and Copilot to share one storage substrate. But it is not magic — fine-grained governance, point-lookup performance on small dimension tables, and audit-ready lineage for highly regulated workloads remain areas where a properly-modeled warehouse still wins.

Microsoft mapping: Microsoft Fabric Lakehouse on OneLake (Delta Parquet with V-Order), Databricks Lakehouse with Unity Catalog, Synapse Spark + Delta Lake.

The Four Workload Archetypes — When Each Storage Type Wins

The honest decision framework is not "database vs warehouse vs lake." It is "what is the workload archetype, and which Microsoft service profile fits it?" Below are the four archetypes we use in every Data Architecture Assessment. Most Fortune 500 enterprises run at least three of the four in parallel — separated by clear architectural boundaries, connected by Fabric Mirroring, Synapse Link, or Data Factory.

Archetype 1

Transactional (OLTP) — the operational system of record

Capture the business as it happens.

Workload characteristics

Thousands of concurrent short-lived transactions, ACID semantics required, sub-50ms read / write latency, indexed point lookups dominate, working set fits in memory for hot tables, single-row updates are the unit of work. Backed by an ERP, CRM, e-commerce checkout, EHR, banking core, or IoT command-and-control surface.

The Microsoft answer

Azure SQL Managed Instance for SQL Server lift-and-shift with full SQL Agent + cross-database queries; Azure SQL Database for net-new cloud OLTP with serverless and Hyperscale tiers; Azure Database for PostgreSQL Flexible Server for modern open-source workloads, with Hyperscale (Citus) when sharding is required; Azure Cosmos DB for globally distributed, multi-model, single-digit-ms latency at any scale.

Archetype 2

Operational analytics — low-latency reads off the operational store

Answer analytical questions without leaving the transactional surface.

Workload characteristics

Real-time dashboards, in-app analytics, embedded reporting inside the operational application, "show me my customer's last 90 days" panels, A/B test surfaces, fraud scoring at point of authorization. Read-mostly, sub-second latency, must not impact OLTP throughput, schema is the operational schema.

The Microsoft answer

Azure Cosmos DB with the analytical store + Synapse Link for HTAP without ETL; Azure SQL DB with non-clustered columnstore indexes on the operational tables (HTAP within one engine); Azure SQL MI with read-scale-out replicas for offloaded reporting; Fabric Mirroring of Azure SQL DB, Cosmos DB, or Azure SQL MI into OneLake for near-real-time Power BI without source-side load.

Archetype 3

Enterprise warehousing — modeled, governed, board-grade analytics

One conformed truth for the business.

Workload characteristics

Star or snowflake schema with conformed dimensions, hundreds to thousands of concurrent analytical queries, governed metric definitions ("revenue", "customer", "ARR" mean one specific thing), regulator-readable lineage, SCD Type 2 history, certified semantic models for Power BI, finance close and board reporting depend on it.

The Microsoft answer

Microsoft Fabric Warehouse on OneLake — strategic platform for net-new investment, columnstore over Delta Parquet, Direct Lake to Power BI, native Purview lineage; Synapse Dedicated SQL Pool — supported through 2026 for existing high-concurrency MPP investments; Azure SQL DB with columnstore for departmental warehouses under ~1 TB compressed.

Archetype 4

Data lake / lakehouse — multi-modal, ML / AI, raw-through-refined at scale

Keep everything. Model what matters. Train on the rest.

Workload characteristics

Heterogeneous data formats (Parquet, CSV, JSON, Avro, images, audio, video, telemetry), petabyte-class storage economics required, schema is partial or evolves, Spark / Python / Copilot / ML training are first-class workloads alongside SQL, medallion pattern (Bronze → Silver → Gold) curates raw to refined, lineage and governance applied progressively as data matures.

The Microsoft answer

Microsoft Fabric Lakehouse on OneLake — Delta Parquet with V-Order, shortcuts to ADLS Gen2 / S3 / GCS / Dataverse, native Spark 3.5 / Delta 3.2 with the Native Execution Engine, Direct Lake to Power BI, integrated with Copilot in Fabric and AI Skills; Synapse Spark + ADLS Gen2 for existing investments; Databricks Lakehouse for organizations where heavy Spark, Delta Live Tables, or Unity Catalog depth dominate the decision.

Architectural Comparison — Database vs Warehouse vs Lake vs Lakehouse

The fourteen-criterion comparison below is the working architectural reference EPC Group's senior architects use in client whiteboards. It is deliberately direct — the differences between these four surfaces are not subtle, and pretending they are is how enterprise data programs get built around the wrong default.

Criterion	Database (OLTP)	Data Warehouse	Data Lake	Lakehouse
Primary use	Operational transactions, system of record, sub-50ms reads and writes	Modeled analytics, conformed metrics, board and regulator reporting	Raw retention, ML training corpora, semi-structured and unstructured data	Unified analytics across raw and modeled data on one storage substrate
Query type	OLTP — point lookups, short-lived transactions, narrow indexed reads	OLAP — aggregations, scans across billions of rows, star-schema joins	Batch and exploratory — Spark / Python / SQL over files, ML feature pipelines	Mixed — SQL aggregations, Spark transformations, ML, and Direct Lake to BI
Typical latency	< 50 ms for indexed reads / writes	Seconds for aggregations over billions of rows; sub-second on hot aggregates	Seconds to minutes — engine-dependent; no intrinsic latency guarantee	Sub-second on Direct Lake reads, seconds to minutes on Spark transforms
Data structure	Structured (relational rows) or document (Cosmos DB)	Structured, modeled — star / snowflake, conformed dimensions	Structured + semi-structured + unstructured — any file type	Structured + semi-structured — Delta Parquet with optional unstructured shortcuts
Schema enforcement	Schema-on-write — enforced at insert time	Schema-on-write — enforced at load, governed by data team	Schema-on-read — enforced (if at all) at query time	Schema-on-write via Delta with evolution support; lake-style shortcuts read-on-schema
ACID transactions	Yes — full ACID, single-row and multi-statement	Yes — ACID transactions on Fabric Warehouse and Synapse Dedicated SQL Pool	No — eventually consistent object storage, file-level only	Yes — Delta Lake / Iceberg provide ACID over Parquet files
Scale unit	vCore / DTU / RU per second; vertical scale with Hyperscale or horizontal shard (Citus, Cosmos)	Fabric Capacity Units (CUs) for Fabric Warehouse; DWUs for Synapse Dedicated SQL Pool	Object-storage byte; effectively unlimited; pay-per-GB stored	Fabric Capacity Units for compute; OneLake byte for storage
Governance surface	Engine-native RLS / CLS / TDE; Purview catalog for discovery	Engine-native RLS / CLS / sensitivity labels; deep Purview lineage; certified semantic models	Storage ACLs (ADLS POSIX, OneLake roles); Purview for catalog; lineage uneven without curation	OneLake roles + Fabric workspace roles + RLS / CLS on SQL endpoint + Purview lineage
Cost model	Per-vCore-hour, per-RU/s, per-DTU; storage typically separate	Capacity CUs (Fabric) or DWU-hour (Synapse); columnstore compression keeps storage low	Pay-per-GB stored + per-TB scanned at query time; cheapest at-rest tier	Capacity CUs for compute + OneLake storage; Delta compaction matters for cost discipline
Power BI integration	Import / DirectQuery; not designed for high-concurrency BI	Direct Lake (Fabric Warehouse) or Import / DirectQuery (Synapse, Azure SQL)	Import after Spark transform; DirectQuery via serverless SQL pool	Direct Lake on V-Order Parquet — Import-mode performance with zero refresh
Copilot / AI grounding	Limited — Copilot in Power Apps / Dataverse for app workloads	Copilot in Fabric Warehouse (T-SQL generation); AI Skills on certified semantic models	Copilot in Fabric Notebooks for Spark / Python; raw lake not directly groundable for BI Copilot	Copilot in Fabric Lakehouse, Notebooks, Data Factory; AI Skills natural-language analytics
Multi-modal data (images, audio, video, telemetry)	No — pure relational / document workloads	No — structured, modeled data only	Yes — first-class for any file type	Yes — Delta tables for structured, shortcuts and unmanaged files for unstructured
ML / AI training	Only for feature serving (low-latency feature lookup at inference time)	Limited — featurization through SQL, training typically happens elsewhere	Native — Spark + Python + PyTorch / TensorFlow on raw and curated data	Native — Fabric Notebooks (Spark / Python / R) + ML feature engineering on Delta tables
Real-time ingestion	Native — applications write transactional events directly	Via Fabric Mirroring (Azure SQL, Cosmos DB, Snowflake) or Data Factory micro-batching	Via Event Hubs / Stream Analytics / Spark Structured Streaming into ADLS / OneLake	Native — Fabric Real-Time Intelligence (KQL) + Eventstream + Lakehouse continuous ingest

Comparison reflects Microsoft data platform state as of June 2026. Sources: Microsoft Learn documentation, Microsoft Fabric public roadmap, and EPC Group production deployments across 500+ Fabric platforms, 1,500+ Power BI implementations, and a comparable Azure SQL and Cosmos DB customer base.

The Lakehouse — The Convergence Story (and Its Honest Pitfalls)

The lakehouse is the most consequential architectural shift in enterprise data platforms since the cloud data warehouse emerged a decade ago. The mechanics are simple: Delta Lake (or Apache Iceberg) layers a transaction log over Parquet files in object storage. The transaction log gives you ACID — atomic commits, isolation across readers and writers, durability, time-travel queries. Parquet gives you columnar compression and predicate pushdown. Object storage gives you petabyte economics and decoupled compute. A SQL engine on top (Fabric Lakehouse SQL endpoint, Databricks SQL warehouse, Synapse Serverless SQL pool) gives you warehouse-like query semantics. The result is a single storage substrate that serves SQL, Spark, Python, R, ML, and Copilot — instead of the legacy pattern of moving data between a lake and a warehouse twice a day.

For Microsoft-anchored enterprises, the 2026 expression of this pattern is Microsoft Fabric Lakehouse on OneLake. OneLake is the tenant-wide unified lake — one logical store across all Fabric workspaces, with V-Order Parquet that lets Power BI Direct Lake read at Import-mode performance with zero refresh, and shortcuts that virtualize ADLS Gen2, Amazon S3, Google Cloud Storage, and Dataverse in place. Lakehouse, Warehouse, Notebook, Real-Time Intelligence, and Power BI are first-class items inside the same workspace identity model. The architectural advantage is real, and for many net-new platforms the lakehouse is the right default.

Where the lakehouse genuinely wins

✓Multi-modal workloads — SQL + Spark + Python + ML + Copilot — on one storage substrate
✓Object-storage economics for raw and historical data at petabyte scale
✓Direct Lake to Power BI eliminates the refresh-vs-DirectQuery tradeoff for Fabric customers
✓Schema evolution support via Delta — additive changes propagate without breaking consumers
✓Time travel and audit history at the Delta transaction-log level — useful for forensic and regulatory review

Honest pitfalls to plan around

!Governance maturity gaps — Delta + Purview lineage is improving quarterly, but regulator-readable end-to-end lineage on lakehouses is not yet at the level of a properly-modeled warehouse
!Fine-grained performance on small dimension tables can be slower than a tuned columnstore + b-tree warehouse for sub-second BI workloads
!Without strict medallion discipline (Bronze / Silver / Gold), a lakehouse becomes a swamp inside six months
!Fabric capacity is shared across the workspace — a runaway Spark job can throttle a board report unless capacity is isolated by domain
!Highly regulated workloads (FedRAMP High, HIPAA, GxP) require a senior-architect-led governance design from day one — not retrofitted after launch

Six Real Scenarios — Mapped to the Microsoft Answer

The six scenarios below are the most common "which data service do we use" questions EPC Group receives in CIO and CDO engagements. For each, we name the Microsoft target architecture and the honest reasoning behind the choice.

Scenario 1

ERP transactional layer — finance, HR, supply chain backing a Dynamics 365 or SAP S/4HANA on Azure deployment

Recommended Microsoft architecture

Azure SQL Managed Instance or Azure Database for PostgreSQL Hyperscale

OLTP workload, ACID requirements, full SQL Server compatibility for D365 F&O extensions or SAP DB layer, transparent migration from on-prem SQL Server with SQL Agent + cross-database queries preserved. PostgreSQL Hyperscale (Citus) is the right answer when sharding by customer or tenant is intrinsic to the data model. Push analytical workloads off through Fabric Mirroring rather than running BI directly against the OLTP store.

Scenario 2

Global product catalog or session store with sub-50ms reads across five continents

Recommended Microsoft architecture

Azure Cosmos DB (multi-region, multi-write)

Cosmos DB is the only Azure data service that offers single-digit-millisecond P99 reads and writes across multiple regions with five well-defined consistency levels. The cost is operational complexity around partition key design, RU/s sizing, and consistency model choice — but for global edge workloads, nothing else in the Microsoft estate matches it. Pair with Synapse Link or Fabric Mirroring to stream operational data into OneLake without affecting the transactional surface.

Scenario 3

Enterprise BI on modeled facts and dimensions serving 500 analysts and the board

Recommended Microsoft architecture

Microsoft Fabric Warehouse + OneLake + Power BI Direct Lake

Fabric Warehouse is the 2026 strategic Microsoft analytical platform. Columnstore over Delta Parquet, native ACID, conformed-dimension friendly, certified semantic models on Direct Lake deliver Import-mode performance with zero refresh. For high-concurrency board and regulator reporting where one definition of "revenue" matters, this is the right place to enforce it. Cross-reference our full Microsoft Fabric expertise hub for governance, capacity sizing, and Power BI patterns.

Microsoft Fabric expertise hub →

Scenario 4

Data science, ML feature engineering, and IoT telemetry at petabyte scale

Recommended Microsoft architecture

Microsoft Fabric Lakehouse on OneLake + Fabric Notebooks (Spark 3.5)

Lakehouse is the right architecture for multi-modal, schema-evolving, ML-heavy workloads. Delta Parquet gives you ACID and time travel without leaving object storage economics. Fabric Notebooks deliver Spark 3.5, Python, R, and Copilot in one workspace. OneLake shortcuts can virtualize existing ADLS Gen2 lakes in place, so the migration to Fabric is incremental rather than wholesale.

Scenario 5

Healthcare analytics across EHR, claims, and clinical operations under HIPAA

Recommended Microsoft architecture

Microsoft Fabric Lakehouse + Purview + sensitivity labels + Entra conditional access

Healthcare is exactly the workload where lake economics and warehouse governance must coexist. Fabric Lakehouse holds raw HL7 / FHIR / claims data in Bronze, conformed encounter and patient tables in Silver, certified analytical marts in Gold — all governed by Purview sensitivity labels propagated from source to Power BI report. HIPAA-eligible service coverage, customer-managed keys via Azure Key Vault, private endpoints, and U.S.-region-pinned OneLake satisfy the regulatory baseline.

Healthcare HIPAA Microsoft consulting →

Scenario 6

Federal mission analytics in Azure Government under FedRAMP High and CMMC 2.0

Recommended Microsoft architecture

Azure Government + Synapse (today) → Fabric (as regional GA matures)

In Azure Government Cloud, Synapse remains the production answer for FedRAMP High and CMMC 2.0 Level 2 analytics today because Fabric capacity SKU GA and feature parity in Government regions is staged. For greenfield workloads in commercial Azure with federal partner-facing requirements, Fabric is GA and appropriate. Plan a quarterly review against the Microsoft Fabric Government Cloud roadmap and migrate when GA parity supports your workload.

Federal FedRAMP / CMMC consulting →

The Five Most Common Mistakes We See

Across 216+ M&A tenant consolidations and 11,000+ Microsoft engagements over 29 years, the same five data architecture mistakes recur. None of them are technical exotica — they are predictable failure modes that show up when teams pick a storage default without mapping it to a workload archetype.

Running BI directly against the OLTP database

The single most common operational analytics failure mode we see in 216+ tenant consolidations. A finance team builds Power BI reports against the ERP database in DirectQuery mode, joins five large tables at report load, and starves the OLTP workload during month-end close. The fix is Fabric Mirroring or Synapse Link from Cosmos DB / Azure SQL DB to OneLake — operational data flows to the analytical surface near-real-time, with zero source-side load. The OLTP database goes back to doing what it is designed for.

Treating the data lake as the warehouse

Teams land everything in ADLS Gen2, write Spark notebooks to transform it, and try to serve hundreds of Power BI analysts through Synapse Serverless SQL pool DirectQuery. Concurrency collapses, costs spike, and lineage is incomplete. The fix is a modeled layer — either a Fabric Warehouse with conformed dimensions or a Fabric Lakehouse Gold zone with certified semantic models on Direct Lake. The lake holds raw and curated; the warehouse / Gold zone holds governed metrics.

Skipping the medallion (Bronze / Silver / Gold) discipline in the lakehouse

Without a medallion architecture, the lakehouse becomes a swamp inside six months. Raw ingestion mixes with curated tables, schema evolution is uncoordinated, lineage breaks, and the Power BI team can no longer tell which tables are board-grade. The fix is explicit Bronze (raw, immutable, schema-on-read), Silver (cleansed, conformed, schema-on-write Delta), Gold (modeled facts and dimensions certified for BI), with workspace permissions and Purview classifications enforced at the zone boundary.

Choosing Cosmos DB when Azure SQL would have worked

Cosmos DB is exceptional for globally distributed, multi-model, single-digit-ms workloads — and operationally complex for everyone else. Teams adopt Cosmos because "NoSQL scales", then spend the next 18 months wrestling partition key design, RU/s capacity, and cross-partition query cost. For most enterprise workloads that fit comfortably inside a single Azure region with relational semantics, Azure SQL DB (or Azure SQL MI) is the lower-TCO answer. We have rewritten Cosmos workloads back onto Azure SQL three times in the last 24 months.

Forgetting that Fabric capacity is shared across the workspace

Synapse made sizing easy — DWU per pool, vCore-hour per Spark pool, per-pipeline activity cost — so chargeback was a fixed-cost calculation per workload. Fabric capacity is shared across Lakehouse, Warehouse, Data Factory, Spark, Real-Time Intelligence, and Power BI Premium. A poorly-tuned Spark job in one workspace can throttle a board report in another. The fix is FinOps tagging at the workspace level, capacity smoothing analysis, and discrete Fabric capacities per critical workload domain — not a single tenant-wide F-SKU.

Governance — Same Controls, Different Mechanisms

For regulated enterprises, the governance baseline is non-negotiable. The encouraging 2026 reality is that the same Microsoft governance surface — Entra identity, Purview catalog and lineage, sensitivity labels, RLS / CLS, customer-managed keys, private endpoints — applies across database, warehouse, lake, and lakehouse. The mechanisms differ, the controls do not. Below is the six-control matrix EPC Group uses for HIPAA, SOC 2, FedRAMP High, FINRA, CMMC 2.0, and GxP engagements.

Control	Database	Warehouse	Lake	Lakehouse
Row-level security (RLS)	Native — SQL Server security predicates; Cosmos DB via partition-key + RBAC	Native — Fabric Warehouse RLS; Synapse Dedicated SQL Pool RLS	Not native — enforced at query engine (Synapse Serverless SQL, Spark)	Native — Fabric Lakehouse SQL endpoint RLS; Delta Lake security predicates
Column-level security (CLS) and dynamic data masking	Native — GRANT / DENY at column level; built-in masking functions	Native — Fabric Warehouse and Synapse CLS + dynamic data masking	Not native — applied at query layer or via Purview classification	Native via Fabric SQL endpoint; column masking propagated from sensitivity labels
Sensitivity labels (Microsoft Purview Information Protection)	Applied via Purview discovery; labels surface in Information Protection reports	Applied at table / column; flow into Power BI semantic models and reports	Applied at file or container via Purview scans; honored by Fabric on read	First-class — labels flow from Lakehouse table to semantic model to Power BI report
Lineage (Microsoft Purview)	Discovery-grade — connector lineage from SQL DB / MI to consumers	Deep lineage — schema, stored procedures, semantic models, Power BI artifacts	Uneven — file-level lineage depends on engine; manual annotation often required	End-to-end inside Fabric — Lakehouse → SQL endpoint → semantic model → report
Identity (Microsoft Entra)	Entra-authenticated principals + SQL logins (disable legacy where possible)	Entra-only for Fabric Warehouse; Entra + SQL for Synapse Dedicated SQL Pool	Entra-based POSIX ACLs on ADLS Gen2; Entra workspace roles on OneLake	Entra throughout — workspace roles, OneLake roles, Fabric item-level permissions
Customer-managed keys (CMK)	Yes — TDE with Azure Key Vault / Managed HSM for Azure SQL DB / MI	Yes — Fabric tenant CMK; Synapse Dedicated SQL Pool TDE with CMK	Yes — ADLS Gen2 and OneLake support tenant-level CMK with Key Vault	Inherited from OneLake tenant CMK; per-workspace key scoping in roadmap

EPC Group delivers data governance under Purview-anchored engagements mapped to the named regulatory baseline. Cross-reference our healthcare HIPAA and federal FedRAMP / CMMC consulting hubs for industry-specific control mapping.

EPC Group's Data Architecture Credential Stack

The framework above is not theoretical. It is the working model behind every Microsoft data platform EPC Group has shipped — across 70+ Fortune 500 clients, 216+ M&A tenant consolidations, 1.83 million users migrated, and a 29-year operating history on the Microsoft cloud since 1997.

11,000+

Microsoft engagements over 29 years

1,500+

Power BI implementations

500+

Microsoft Fabric platforms

216+

M&A tenant consolidations

Microsoft Press authorship on Power BI and Microsoft data platforms

Founder & Chief AI Architect Errin O'Connor is a four-time Microsoft Press bestselling author — Power BI, SharePoint, Azure, and large-scale enterprise migrations — the same architectural patterns that inform every Fabric, Synapse, and Power BI engagement we deliver. Power BI expertise hub →

All six current Microsoft Solutions Partner Designations

Data & AI, Modern Work, Infrastructure, Security, Digital & App Innovation, Business Applications — full coverage of the Microsoft cloud stack with no subcontracting. The Data & AI Designation is the named credential behind every Fabric, Lakehouse, and Cosmos engagement we deliver.

The EPC Group Lifecycle — Assess → Modernize → Govern → Operate → Enable

The named, single-accountable-partner delivery model that lets the same senior architects own a data platform engagement from board roadmap through year-two managed operations. No phase-to-phase team rotation. See the full Lifecycle →

Compliance-native delivery across HIPAA, SOC 2, FedRAMP High, FINRA, CMMC, GxP

Database, warehouse, lake, and lakehouse engagements delivered with documented control mapping to the named regulatory baseline. Zero audit findings across the regulated industries EPC Group serves. U.S.-citizen-only delivery teams available for federal and CMMC 2.0 engagements.

Three Engagement Models for Data Architecture Modernization

EPC Group structures Microsoft data architecture engagements three ways. Most customers enter through a Data Architecture Assessment, then move into a Modernization Accelerator if the diagnosis supports it, then into Managed Fabric + Lakehouse for steady-state operations. Assessment and Accelerator pricing is fixed-fee — known before delivery starts.

2-week fixed fee

Data Architecture Assessment

Senior-architect-led discovery of the existing data estate — every OLTP database, every analytical workload, every lake or warehouse, every Power BI tenant. We map workload-by-workload onto the four archetypes, identify mismatches (BI-on-OLTP, lake-as-warehouse, Cosmos-when-SQL), and deliver a fixed-fee Microsoft target architecture with sized Fabric capacity, identified migration sequence, and a 12-month investment plan.

Best for: CIOs and CDOs who inherited a sprawling data estate (Azure SQL + Synapse + ADLS Gen2 + Cosmos + on-prem SQL Server + Databricks + Snowflake) and need an honest read before committing to a modernization program.

90-day fixed fee

Modernization Accelerator

Five-phase modernization delivery — Assess (workload inventory + target mapping), Sizing (Fabric capacity, Cosmos RU/s, Azure SQL service tier), Mirror (OneLake shortcuts and Fabric Mirroring of operational sources), Refactor (lakehouse medallion, Direct Lake semantic models, Pipelines on Fabric Data Factory), and Cutover (parallel-run validation, lineage handoff, runbook). Compliance-native delivery aligned to your regulatory baseline — HIPAA, SOC 2, FedRAMP High, FINRA, CMMC 2.0, GxP.

Best for: Organizations ready to retire a legacy data warehouse, consolidate a sprawling lake, or migrate Synapse → Fabric inside a calendar quarter, with a single fixed-fee program and a senior architect as named owner.

Monthly retainer

Managed Fabric + Lakehouse

24/7 co-managed operations across the Fabric estate — capacity smoothing analysis, Lakehouse medallion stewardship, Direct Lake semantic model maintenance, Cosmos DB / Azure SQL operational oversight, Purview lineage health, Entra access reviews, named senior architect on retainer, and monthly executive reporting that ties data platform spend to business outcomes.

Best for: Production-scale Fabric and lakehouse customers who need senior-architect bench depth without hiring it full-time, with single-vendor accountability across the full Microsoft data stack.

Database, Warehouse, Lake, and Lakehouse Under the EPC Group Lifecycle

Every Microsoft data architecture engagement EPC Group delivers runs under the EPC Group Lifecycle — Assess, Modernize, Govern, Operate, Enable — so the same senior architects move with the platform from roadmap through year-two managed operations. One contract. One escalation path. One named owner for the data platform.

See the full Lifecycle Microsoft Fabric expertise

Frequently Asked Questions

What is the difference between OLTP and OLAP, and how do I know which I need?

OLTP (online transactional processing) is the workload profile of an operational database — thousands of short-lived, highly concurrent transactions that read and write the current state of the business. Sub-50ms latency, full ACID semantics, narrow indexed lookups, and row-by-row updates dominate. OLAP (online analytical processing) is the workload profile of a data warehouse or lakehouse — long-running aggregations across billions of rows of history, sorting on conformed dimensions, with hundreds of concurrent analysts but no individual-row updates. The two profiles fight each other inside a single engine: running heavy aggregations against an OLTP database starves the transactional workload, and running thousands of point lookups against an OLAP warehouse wastes its columnstore advantage. The rule is simple: capture in an OLTP system (Azure SQL DB, Cosmos DB, Azure SQL MI, PostgreSQL Hyperscale); report and analyze in an OLAP system (Fabric Warehouse, Fabric Lakehouse, Synapse Dedicated SQL Pool); move data between them with Fabric Mirroring, Synapse Link, or Data Factory. Most enterprises need both, separated by clear architectural boundaries.

When does Cosmos DB beat Azure SQL Database, and when is it the wrong choice?

Cosmos DB beats Azure SQL Database when the workload is globally distributed (more than one Azure region serving live traffic), requires single-digit-millisecond P99 latency across continents, is multi-model (document, graph, key-value, or column-family in one engine), or scales beyond the practical envelope of a relational engine (think gaming back-ends, ad-tech bidding, IoT command-and-control, or session stores at hundreds of thousands of operations per second). Cosmos is also the right answer when the data model is genuinely schema-flexible — heterogeneous documents that do not fit comfortably into a relational schema. Cosmos is the wrong choice when the workload fits inside a single Azure region with relational semantics, when SQL Server compatibility (cross-database queries, SQL Agent, SSIS) matters, when transactional history and complex joins dominate, or when the team has no operational experience tuning partition keys and RU/s. We have rewritten Cosmos workloads back onto Azure SQL three times in the last 24 months because the operational complexity outweighed the scale benefit. The honest 2026 default for new enterprise OLTP is Azure SQL DB or Azure SQL MI; Cosmos is the right choice when one of the four conditions above is genuinely true.

What is the difference between a lakehouse and a data warehouse — and which do I need in 2026?

A data warehouse (Fabric Warehouse, Synapse Dedicated SQL Pool, traditional SQL Server with columnstore) is a modeled analytical engine — schema-on-write, conformed dimensions, governed metric definitions, ACID transactions on relational tables, optimized for high-concurrency aggregation. A lakehouse (Fabric Lakehouse on OneLake, Databricks Lakehouse with Unity Catalog) is a SQL surface layered over Delta Lake (Parquet + transaction log) — schema is enforced by Delta with evolution support, ACID transactions are guaranteed by the Delta protocol, multi-modal data (Spark, SQL, Python, R, ML) shares one storage substrate. The 2026 honest read: a lakehouse can do most of what a warehouse does, plus ML and unstructured data, at object-storage cost — which is why Microsoft is pushing Fabric Lakehouse as the architectural default for net-new platforms. A modeled warehouse still wins for highly regulated workloads where regulator-readable lineage and fine-grained metric governance dominate, for departmental BI workloads under ~1 TB where the lakehouse adds operational complexity without benefit, and for sub-second point-lookup workloads on small dimension tables where the warehouse columnstore + b-tree pattern outperforms Delta Parquet. Most Fortune 500 platforms we build today run both: Lakehouse for raw-through-curated and ML, Warehouse for board-grade conformed marts, with both feeding Power BI through Direct Lake.

Can a Fabric Lakehouse really replace my enterprise data warehouse?

For most net-new enterprise data platforms in 2026 — yes. Fabric Lakehouse on OneLake offers ACID transactions via Delta Lake, RLS / CLS on the SQL endpoint, V-Order Parquet that lets Power BI Direct Lake read at Import-mode performance with zero refresh, native Purview lineage end-to-end, and the same Entra identity surface that governs the rest of the Microsoft estate. We have built lakehouses that retired traditional warehouses for healthcare, manufacturing, and professional services Fortune 500 clients. But the honest qualifier: lakehouse replacement of warehouse is not automatic. It requires (1) a disciplined medallion architecture (Bronze / Silver / Gold) with workspace boundaries, (2) certified semantic models on the Gold zone for board reporting, (3) Fabric capacity sized to your concurrency curve — sustained high-concurrency analytical workloads can require larger F-SKUs than the equivalent Synapse Dedicated SQL Pool DWU sizing, (4) Purview governance configured from day one — not retrofitted six months in, (5) clear cost-tagging because Fabric capacity is shared across workloads. Where these conditions are not met, the Warehouse + Lakehouse hybrid pattern is the lower-risk answer for another 12–24 months. EPC Group always runs a Fabric Unit of Capacity (FUC) sizing exercise before recommending lakehouse-only architectures for high-concurrency workloads.

What is the difference between OneLake and ADLS Gen2?

ADLS Gen2 (Azure Data Lake Storage Gen2) is the hierarchical-namespace object storage layer that has powered Azure data platforms for years — POSIX-style ACLs, container + folder + file hierarchy, addressable by storage account URL, priced per GB stored. It is generic, mature, and tied to Azure storage account boundaries. OneLake is the Fabric-tenant-wide unified data lake — a single logical lake that spans all Fabric workspaces in your Microsoft tenant, with workspace and item permissions enforced by Microsoft Entra, with shortcuts that can virtualize external storage (ADLS Gen2, Amazon S3, Google Cloud Storage, Dataverse) without data copy, and with built-in support for Delta Parquet with V-Order optimization for Power BI Direct Lake. OneLake is not a replacement for ADLS Gen2 — under the hood, OneLake storage is itself built on Azure Storage. The practical difference for an enterprise: with ADLS Gen2 you manage storage accounts, containers, and ACLs as primary objects; with OneLake you manage Fabric workspaces, items, and shortcuts as primary objects, and storage management becomes implicit. For existing ADLS Gen2 lakes, OneLake shortcuts allow you to expose them inside Fabric without migration. For net-new platforms in 2026, OneLake is the right starting point.

When do I use Power BI Import vs DirectQuery vs Direct Lake?

Import is the original Power BI consumption mode — data is loaded into the VertiPaq columnar in-memory engine on a scheduled refresh (typically 1×/day to 8×/day on Premium). Best performance, deterministic latency, full DAX coverage, and decoupled from source system load — but data is only as fresh as the last refresh, and very large semantic models can exceed Premium capacity memory limits. DirectQuery sends every Power BI report query as a live SQL query to the underlying source (Azure SQL, Synapse, Fabric Warehouse). Always-current data, no semantic model size limit, but performance depends on source throughput and source-side query load can become a bottleneck. Direct Lake is the 2026 Fabric-only mode that reads V-Order Parquet directly from OneLake into the VertiPaq engine — Import-mode performance with zero refresh, governed by Fabric capacity, requires Fabric Lakehouse or Fabric Warehouse as the source. The 2026 default for Fabric-anchored workloads is Direct Lake. Import remains correct when the source is non-Fabric (Azure SQL DB, Snowflake, Salesforce) and refresh latency is acceptable, or when complex M-language transformations are required. DirectQuery is correct when freshness must be real-time and the source can absorb the query load — generally a niche use case in 2026.

Is Fabric Warehouse the same as Synapse Dedicated SQL Pool — and should I migrate?

They are architecturally distinct. Synapse Dedicated SQL Pool is the renamed Azure SQL Data Warehouse — a massively parallel processing (MPP) SQL warehouse with provisioned compute measured in Data Warehouse Units (DWU), running on the original Polybase-derived MPP engine. Fabric Warehouse is the 2026 Microsoft analytical warehouse, built on the same Polaris distributed query engine but storing data as Delta Parquet on OneLake (not in a proprietary internal format), priced through unified Fabric capacity (not DWU-hour), and natively integrated with Direct Lake to Power BI. Functionally, Fabric Warehouse delivers T-SQL parity for most workloads, with some Synapse-specific features (resource classes, MPP-style workload management isolation, certain hint syntaxes) absent or different. Microsoft has not announced an EOL for Synapse Dedicated SQL Pool as of June 2026, but net-new investment is concentrated on Fabric. Migration triggers: Power BI Direct Lake is the strongest, OneLake mirroring of operational sources is second, Copilot in Fabric is third. Stay on Synapse Dedicated SQL Pool when the workload is tightly tuned at high concurrency, the chargeback model is built on DWU sizing, or sovereign-region Fabric GA is not yet available. We cover the full Synapse → Fabric migration framework in our Azure Synapse Analytics Enterprise Guide.

How do data lakes fail without governance — and how do we avoid the swamp?

Ungoverned data lakes fail in five recognizable ways. First, ingestion outpaces curation — terabytes land in raw zones and never get modeled, lineage breaks, and within a year nobody can tell which datasets are trustworthy. Second, schema evolves silently — a source system changes a field type, downstream Spark jobs and SQL queries break in unpredictable ways, and incident response becomes archaeology. Third, security is by obscurity — POSIX ACLs are applied inconsistently, sensitive data appears in the wrong containers, and HIPAA / PII exposure becomes an audit finding. Fourth, cost discipline collapses — pay-per-TB-scanned queries run repeatedly against poorly partitioned CSV files, costs spike, and FinOps cannot attribute spend. Fifth, BI consumers route around the lake — they extract data they need into Excel or rogue Power BI tenants, and the lake becomes a write-only system. The avoidance pattern is explicit and non-optional: medallion architecture (Bronze / Silver / Gold) with workspace boundaries, Delta Parquet for everything beyond Bronze, Purview catalog and sensitivity labels applied at ingestion, certified semantic models for BI consumption, Fabric capacity discipline, and senior-architect-led governance with quarterly steward reviews. EPC Group delivers this under our data governance engagement model.

Related Resources

Talk to a Data Architect

A 60-minute call with a senior architect — not a sales lead. We will give you an honest read on which Microsoft storage layer fits each of your workloads (database, warehouse, lake, lakehouse), what your modernization sequence looks like across a 12-month horizon, and where Fabric, Cosmos DB, Azure SQL, and Synapse each belong in your target architecture. If a non-Microsoft platform is the right answer for a specific workload, we will say so on the call.

Schedule a fit-call 888-381-9725

Errin O'Connor

Founder & Chief AI Architect · Chief AI Architect

contact@epcgroup.net

Headquarters

4900 Woodway Drive, Suite 830, Houston, TX 77056

What is the difference between a database, a data warehouse, and a data lake — and which does my enterprise need?

Key Facts

Database (OLTP) examples: Azure SQL Database, Azure SQL Managed Instance, Cosmos DB, Azure Database for PostgreSQL Flexible Server / Hyperscale (Citus)
Data warehouse (OLAP) examples: Microsoft Fabric Warehouse, Synapse Dedicated SQL Pool, Azure SQL DB with columnstore for departmental scale
Data lake examples: Azure Data Lake Storage Gen2 (ADLS Gen2), Microsoft OneLake — Fabric-tenant-wide unified lake with V-Order Delta Parquet
Lakehouse examples: Microsoft Fabric Lakehouse on OneLake, Databricks Lakehouse on Unity Catalog — Delta or Iceberg over object storage with ACID + SQL + ML
EPC Group: 216+ M&A tenant consolidations, 1,500+ Power BI implementations, 500+ Microsoft Fabric platforms across 70+ Fortune 500 clients
Microsoft Solutions Partner — Data & AI Designation plus five other current Solutions Partner Designations; senior-architect-led, fixed-fee delivery
Compliance-native baselines for HIPAA, SOC 2, FedRAMP, FINRA, CMMC, GxP regulated workloads across Azure Commercial and Azure Government
Engagement cadence: Data Architecture Assessment (2-week fixed fee) → Modernization Accelerator (90-day fixed fee) → Managed Fabric + Lakehouse (monthly retainer)

What These Four Things Actually Are

Database (OLTP)

Operational system of record optimized for high-concurrency reads and writes against current state.

Microsoft mapping: Azure SQL Database, Azure SQL Managed Instance, Azure Database for PostgreSQL (Flexible Server and Hyperscale/Citus), Azure Cosmos DB, Azure SQL on IaaS VMs.

Data Warehouse (OLAP)

Modeled, governed analytical store optimized for low-latency aggregation across billions of rows.

Microsoft mapping: Microsoft Fabric Warehouse, Synapse Dedicated SQL Pool, Azure SQL DB with columnstore indexes for departmental warehousing.

Data Lake

Schema-on-read object storage that holds raw, semi-structured, and unstructured data at any scale.

Architectural Comparison — Database vs Warehouse vs Lake vs Lakehouse

Criterion	Database (OLTP)	Data Warehouse	Data Lake	Lakehouse
Primary use	Operational transactions, system of record, sub-50ms reads and writes	Modeled analytics, conformed metrics, board and regulator reporting	Raw retention, ML training corpora, semi-structured and unstructured data	Unified analytics across raw and modeled data on one storage substrate
Query type	OLTP — point lookups, short-lived transactions, narrow indexed reads	OLAP — aggregations, scans across billions of rows, star-schema joins	Batch and exploratory — Spark / Python / SQL over files, ML feature pipelines	Mixed — SQL aggregations, Spark transformations, ML, and Direct Lake to BI
Typical latency	< 50 ms for indexed reads / writes	Seconds for aggregations over billions of rows; sub-second on hot aggregates	Seconds to minutes — engine-dependent; no intrinsic latency guarantee	Sub-second on Direct Lake reads, seconds to minutes on Spark transforms
Data structure	Structured (relational rows) or document (Cosmos DB)	Structured, modeled — star / snowflake, conformed dimensions	Structured + semi-structured + unstructured — any file type	Structured + semi-structured — Delta Parquet with optional unstructured shortcuts
Schema enforcement	Schema-on-write — enforced at insert time	Schema-on-write — enforced at load, governed by data team	Schema-on-read — enforced (if at all) at query time	Schema-on-write via Delta with evolution support; lake-style shortcuts read-on-schema
ACID transactions	Yes — full ACID, single-row and multi-statement	Yes — ACID transactions on Fabric Warehouse and Synapse Dedicated SQL Pool	No — eventually consistent object storage, file-level only	Yes — Delta Lake / Iceberg provide ACID over Parquet files
Scale unit	vCore / DTU / RU per second; vertical scale with Hyperscale or horizontal shard (Citus, Cosmos)	Fabric Capacity Units (CUs) for Fabric Warehouse; DWUs for Synapse Dedicated SQL Pool	Object-storage byte; effectively unlimited; pay-per-GB stored	Fabric Capacity Units for compute; OneLake byte for storage
Governance surface	Engine-native RLS / CLS / TDE; Purview catalog for discovery	Engine-native RLS / CLS / sensitivity labels; deep Purview lineage; certified semantic models	Storage ACLs (ADLS POSIX, OneLake roles); Purview for catalog; lineage uneven without curation	OneLake roles + Fabric workspace roles + RLS / CLS on SQL endpoint + Purview lineage
Cost model	Per-vCore-hour, per-RU/s, per-DTU; storage typically separate	Capacity CUs (Fabric) or DWU-hour (Synapse); columnstore compression keeps storage low	Pay-per-GB stored + per-TB scanned at query time; cheapest at-rest tier	Capacity CUs for compute + OneLake storage; Delta compaction matters for cost discipline
Power BI integration	Import / DirectQuery; not designed for high-concurrency BI	Direct Lake (Fabric Warehouse) or Import / DirectQuery (Synapse, Azure SQL)	Import after Spark transform; DirectQuery via serverless SQL pool	Direct Lake on V-Order Parquet — Import-mode performance with zero refresh
Copilot / AI grounding	Limited — Copilot in Power Apps / Dataverse for app workloads	Copilot in Fabric Warehouse (T-SQL generation); AI Skills on certified semantic models	Copilot in Fabric Notebooks for Spark / Python; raw lake not directly groundable for BI Copilot	Copilot in Fabric Lakehouse, Notebooks, Data Factory; AI Skills natural-language analytics
Multi-modal data (images, audio, video, telemetry)	No — pure relational / document workloads	No — structured, modeled data only	Yes — first-class for any file type	Yes — Delta tables for structured, shortcuts and unmanaged files for unstructured
ML / AI training	Only for feature serving (low-latency feature lookup at inference time)	Limited — featurization through SQL, training typically happens elsewhere	Native — Spark + Python + PyTorch / TensorFlow on raw and curated data	Native — Fabric Notebooks (Spark / Python / R) + ML feature engineering on Delta tables
Real-time ingestion	Native — applications write transactional events directly	Via Fabric Mirroring (Azure SQL, Cosmos DB, Snowflake) or Data Factory micro-batching	Via Event Hubs / Stream Analytics / Spark Structured Streaming into ADLS / OneLake	Native — Fabric Real-Time Intelligence (KQL) + Eventstream + Lakehouse continuous ingest

The Lakehouse — The Convergence Story (and Its Honest Pitfalls)

Where the lakehouse genuinely wins

✓Multi-modal workloads — SQL + Spark + Python + ML + Copilot — on one storage substrate
✓Object-storage economics for raw and historical data at petabyte scale
✓Direct Lake to Power BI eliminates the refresh-vs-DirectQuery tradeoff for Fabric customers
✓Schema evolution support via Delta — additive changes propagate without breaking consumers
✓Time travel and audit history at the Delta transaction-log level — useful for forensic and regulatory review

Honest pitfalls to plan around

!Governance maturity gaps — Delta + Purview lineage is improving quarterly, but regulator-readable end-to-end lineage on lakehouses is not yet at the level of a properly-modeled warehouse
!Fine-grained performance on small dimension tables can be slower than a tuned columnstore + b-tree warehouse for sub-second BI workloads
!Without strict medallion discipline (Bronze / Silver / Gold), a lakehouse becomes a swamp inside six months
!Fabric capacity is shared across the workspace — a runaway Spark job can throttle a board report unless capacity is isolated by domain
!Highly regulated workloads (FedRAMP High, HIPAA, GxP) require a senior-architect-led governance design from day one — not retrofitted after launch

Six Real Scenarios — Mapped to the Microsoft Answer

Scenario 1

ERP transactional layer — finance, HR, supply chain backing a Dynamics 365 or SAP S/4HANA on Azure deployment

Recommended Microsoft architecture

Azure SQL Managed Instance or Azure Database for PostgreSQL Hyperscale

Scenario 2

Global product catalog or session store with sub-50ms reads across five continents

Recommended Microsoft architecture

Azure Cosmos DB (multi-region, multi-write)

Scenario 3

Enterprise BI on modeled facts and dimensions serving 500 analysts and the board

Recommended Microsoft architecture

Microsoft Fabric Warehouse + OneLake + Power BI Direct Lake

Microsoft Fabric expertise hub →

Scenario 4

Data science, ML feature engineering, and IoT telemetry at petabyte scale

Recommended Microsoft architecture

Microsoft Fabric Lakehouse on OneLake + Fabric Notebooks (Spark 3.5)

Scenario 5

Healthcare analytics across EHR, claims, and clinical operations under HIPAA

Recommended Microsoft architecture

Microsoft Fabric Lakehouse + Purview + sensitivity labels + Entra conditional access

Healthcare HIPAA Microsoft consulting →

Scenario 6

Federal mission analytics in Azure Government under FedRAMP High and CMMC 2.0

Recommended Microsoft architecture

Azure Government + Synapse (today) → Fabric (as regional GA matures)

Federal FedRAMP / CMMC consulting →

The Five Most Common Mistakes We See

Running BI directly against the OLTP database

Treating the data lake as the warehouse

Skipping the medallion (Bronze / Silver / Gold) discipline in the lakehouse

Choosing Cosmos DB when Azure SQL would have worked

Forgetting that Fabric capacity is shared across the workspace

Governance — Same Controls, Different Mechanisms

Control	Database	Warehouse	Lake	Lakehouse
Row-level security (RLS)	Native — SQL Server security predicates; Cosmos DB via partition-key + RBAC	Native — Fabric Warehouse RLS; Synapse Dedicated SQL Pool RLS	Not native — enforced at query engine (Synapse Serverless SQL, Spark)	Native — Fabric Lakehouse SQL endpoint RLS; Delta Lake security predicates
Column-level security (CLS) and dynamic data masking	Native — GRANT / DENY at column level; built-in masking functions	Native — Fabric Warehouse and Synapse CLS + dynamic data masking	Not native — applied at query layer or via Purview classification	Native via Fabric SQL endpoint; column masking propagated from sensitivity labels
Sensitivity labels (Microsoft Purview Information Protection)	Applied via Purview discovery; labels surface in Information Protection reports	Applied at table / column; flow into Power BI semantic models and reports	Applied at file or container via Purview scans; honored by Fabric on read	First-class — labels flow from Lakehouse table to semantic model to Power BI report
Lineage (Microsoft Purview)	Discovery-grade — connector lineage from SQL DB / MI to consumers	Deep lineage — schema, stored procedures, semantic models, Power BI artifacts	Uneven — file-level lineage depends on engine; manual annotation often required	End-to-end inside Fabric — Lakehouse → SQL endpoint → semantic model → report
Identity (Microsoft Entra)	Entra-authenticated principals + SQL logins (disable legacy where possible)	Entra-only for Fabric Warehouse; Entra + SQL for Synapse Dedicated SQL Pool	Entra-based POSIX ACLs on ADLS Gen2; Entra workspace roles on OneLake	Entra throughout — workspace roles, OneLake roles, Fabric item-level permissions
Customer-managed keys (CMK)	Yes — TDE with Azure Key Vault / Managed HSM for Azure SQL DB / MI	Yes — Fabric tenant CMK; Synapse Dedicated SQL Pool TDE with CMK	Yes — ADLS Gen2 and OneLake support tenant-level CMK with Key Vault	Inherited from OneLake tenant CMK; per-workspace key scoping in roadmap

EPC Group's Data Architecture Credential Stack

11,000+

Microsoft engagements over 29 years

1,500+

Power BI implementations

500+

Microsoft Fabric platforms

216+

M&A tenant consolidations

Microsoft Press authorship on Power BI and Microsoft data platforms

All six current Microsoft Solutions Partner Designations

The EPC Group Lifecycle — Assess → Modernize → Govern → Operate → Enable

Compliance-native delivery across HIPAA, SOC 2, FedRAMP High, FINRA, CMMC, GxP

Three Engagement Models for Data Architecture Modernization

2-week fixed fee

Data Architecture Assessment

90-day fixed fee

Modernization Accelerator

Monthly retainer

Managed Fabric + Lakehouse

Database, Warehouse, Lake, and Lakehouse Under the EPC Group Lifecycle

See the full Lifecycle Microsoft Fabric expertise

Frequently Asked Questions

What is the difference between OLTP and OLAP, and how do I know which I need?

When does Cosmos DB beat Azure SQL Database, and when is it the wrong choice?

What is the difference between a lakehouse and a data warehouse — and which do I need in 2026?

Can a Fabric Lakehouse really replace my enterprise data warehouse?

What is the difference between OneLake and ADLS Gen2?

When do I use Power BI Import vs DirectQuery vs Direct Lake?

Is Fabric Warehouse the same as Synapse Dedicated SQL Pool — and should I migrate?

How do data lakes fail without governance — and how do we avoid the swamp?

Related Resources

Talk to a Data Architect

Schedule a fit-call 888-381-9725

Errin O'Connor

Founder & Chief AI Architect · Chief AI Architect

contact@epcgroup.net

Headquarters

4900 Woodway Drive, Suite 830, Houston, TX 77056

Database vs Data Warehouse vs Data Lake: The Microsoft Decision Framework (2026)

Key Facts

What These Four Things Actually Are

Database (OLTP)

Data Warehouse (OLAP)

Data Lake

Lakehouse

The Four Workload Archetypes — When Each Storage Type Wins

Transactional (OLTP) — the operational system of record

Operational analytics — low-latency reads off the operational store

Enterprise warehousing — modeled, governed, board-grade analytics

Data lake / lakehouse — multi-modal, ML / AI, raw-through-refined at scale

Architectural Comparison — Database vs Warehouse vs Lake vs Lakehouse

The Lakehouse — The Convergence Story (and Its Honest Pitfalls)

Where the lakehouse genuinely wins

Honest pitfalls to plan around

Six Real Scenarios — Mapped to the Microsoft Answer

ERP transactional layer — finance, HR, supply chain backing a Dynamics 365 or SAP S/4HANA on Azure deployment

Global product catalog or session store with sub-50ms reads across five continents

Enterprise BI on modeled facts and dimensions serving 500 analysts and the board

Data science, ML feature engineering, and IoT telemetry at petabyte scale

Healthcare analytics across EHR, claims, and clinical operations under HIPAA

Federal mission analytics in Azure Government under FedRAMP High and CMMC 2.0

The Five Most Common Mistakes We See

Running BI directly against the OLTP database

Treating the data lake as the warehouse

Skipping the medallion (Bronze / Silver / Gold) discipline in the lakehouse

Choosing Cosmos DB when Azure SQL would have worked

Forgetting that Fabric capacity is shared across the workspace

Governance — Same Controls, Different Mechanisms

EPC Group's Data Architecture Credential Stack

Microsoft Press authorship on Power BI and Microsoft data platforms

All six current Microsoft Solutions Partner Designations

The EPC Group Lifecycle — Assess → Modernize → Govern → Operate → Enable

Compliance-native delivery across HIPAA, SOC 2, FedRAMP High, FINRA, CMMC, GxP

Three Engagement Models for Data Architecture Modernization

Data Architecture Assessment

Modernization Accelerator

Managed Fabric + Lakehouse

Database, Warehouse, Lake, and Lakehouse Under the EPC Group Lifecycle

Frequently Asked Questions

Related Resources

Talk to a Data Architect

Database vs Data Warehouse vs Data Lake: The Microsoft Decision Framework (2026)

Key Facts

What These Four Things Actually Are

Database (OLTP)

Data Warehouse (OLAP)

Data Lake

Lakehouse

The Four Workload Archetypes — When Each Storage Type Wins

Transactional (OLTP) — the operational system of record

Operational analytics — low-latency reads off the operational store

Enterprise warehousing — modeled, governed, board-grade analytics

Data lake / lakehouse — multi-modal, ML / AI, raw-through-refined at scale

Architectural Comparison — Database vs Warehouse vs Lake vs Lakehouse

The Lakehouse — The Convergence Story (and Its Honest Pitfalls)

Where the lakehouse genuinely wins

Honest pitfalls to plan around

Six Real Scenarios — Mapped to the Microsoft Answer

ERP transactional layer — finance, HR, supply chain backing a Dynamics 365 or SAP S/4HANA on Azure deployment

Global product catalog or session store with sub-50ms reads across five continents

Enterprise BI on modeled facts and dimensions serving 500 analysts and the board

Data science, ML feature engineering, and IoT telemetry at petabyte scale

Healthcare analytics across EHR, claims, and clinical operations under HIPAA

Federal mission analytics in Azure Government under FedRAMP High and CMMC 2.0

The Five Most Common Mistakes We See

Running BI directly against the OLTP database

Treating the data lake as the warehouse

Skipping the medallion (Bronze / Silver / Gold) discipline in the lakehouse

Choosing Cosmos DB when Azure SQL would have worked

Forgetting that Fabric capacity is shared across the workspace

Governance — Same Controls, Different Mechanisms

EPC Group's Data Architecture Credential Stack

Microsoft Press authorship on Power BI and Microsoft data platforms

All six current Microsoft Solutions Partner Designations

The EPC Group Lifecycle — Assess → Modernize → Govern → Operate → Enable

Compliance-native delivery across HIPAA, SOC 2, FedRAMP High, FINRA, CMMC, GxP

Three Engagement Models for Data Architecture Modernization

Data Architecture Assessment

Database vs Data Warehouse vs Data Lake:
The Microsoft Decision Framework (2026)

Database vs Data Warehouse vs Data Lake:
The Microsoft Decision Framework (2026)