Microsoft data architecture decision framework · updated June 2026
Database vs Data Warehouse vs Data Lake: The Microsoft Decision Framework (2026)
OLTP, operational analytics, modeled warehousing, and lakehouse — mapped to Azure SQL, Cosmos DB, Fabric Warehouse, Fabric Lakehouse, and OneLake. From the team behind 216+ tenant consolidations, 1,500+ Power BI implementations, and 500+ Fabric platforms across 70+ Fortune 500 clients.
What is the difference between a database, a data warehouse, and a data lake — and which does my enterprise need?
A database (Azure SQL DB, Cosmos DB, PostgreSQL Hyperscale) captures the operational state of the business — OLTP, ACID, sub-50ms latency. A data warehouse (Microsoft Fabric Warehouse, Synapse Dedicated SQL Pool) models history for governed analytics — OLAP, conformed dimensions, board-grade reporting. A data lake (Azure Data Lake Storage Gen2, OneLake) holds raw and unstructured data at any scale for ML, data science, and IoT. A lakehouse (Microsoft Fabric Lakehouse on OneLake) is the 2026 architectural convergence — Delta Parquet over object storage gives you warehouse-grade ACID + SQL + RLS over lake-grade economics + ML + multi-modal. Most Fortune 500 enterprises need at least three: a database for capture, a lakehouse for raw-through-curated and ML, and either a Fabric Warehouse or the Gold zone of the lakehouse for board-grade governed analytics. The honest decision is which workload lives where — not which platform "wins."
In 2026, Microsoft's data architecture defaults are: Azure SQL Database (or Cosmos DB) for OLTP capture, Microsoft Fabric Lakehouse on OneLake for raw-through-curated analytics and ML, and Microsoft Fabric Warehouse for board-grade conformed marts. The lakehouse pattern is converging warehouse and lake for net-new platforms, but a modeled warehouse still wins for regulator-readable lineage and high-concurrency conformed reporting. EPC Group has delivered this architecture for 70+ Fortune 500 clients under HIPAA, SOC 2, FedRAMP, FINRA, CMMC, and GxP regulatory baselines.
Data warehouse (OLAP) examples: Microsoft Fabric Warehouse, Synapse Dedicated SQL Pool, Azure SQL DB with columnstore for departmental scale
Data lake examples: Azure Data Lake Storage Gen2 (ADLS Gen2), Microsoft OneLake — Fabric-tenant-wide unified lake with V-Order Delta Parquet
Lakehouse examples: Microsoft Fabric Lakehouse on OneLake, Databricks Lakehouse on Unity Catalog — Delta or Iceberg over object storage with ACID + SQL + ML
EPC Group: 216+ M&A tenant consolidations, 1,500+ Power BI implementations, 500+ Microsoft Fabric platforms across 70+ Fortune 500 clients
Microsoft Solutions Partner — Data & AI Designation plus five other current Solutions Partner Designations; senior-architect-led, fixed-fee delivery
Compliance-native baselines for HIPAA, SOC 2, FedRAMP, FINRA, CMMC, GxP regulated workloads across Azure Commercial and Azure Government
Half of the Microsoft data architecture confusion we encounter in CIO and CDO engagements is vocabulary — the words "database," "warehouse," "lake," and "lakehouse" are used interchangeably in vendor marketing, in trade press, and inside enterprise architecture teams. They are not the same. They are four distinct architectural surfaces with different workload profiles, different governance properties, and different cost models. Below is the working definition for each — the way EPC Group's senior architects use the terms in every Fabric and Synapse engagement we ship.
Database (OLTP)
Operational system of record optimized for high-concurrency reads and writes against current state.
A database in the enterprise sense is an OLTP (online transactional processing) engine — it captures business events as they happen. Sub-50ms read and write latency, ACID transactional guarantees, row-store or document-store layout, narrow indexes tuned for point lookups, and concurrency control built for thousands of simultaneous short-lived transactions. It holds the current state of the business — the customer record, the order line, the inventory count — not its history. Reporting is a secondary use case at best, and analytical queries that scan millions of rows will starve the transactional workload they share the engine with.
Microsoft mapping: Azure SQL Database, Azure SQL Managed Instance, Azure Database for PostgreSQL (Flexible Server and Hyperscale/Citus), Azure Cosmos DB, Azure SQL on IaaS VMs.
Data Warehouse (OLAP)
Modeled, governed analytical store optimized for low-latency aggregation across billions of rows.
A data warehouse is an OLAP (online analytical processing) engine — it answers analytical questions across history. Columnstore compression, massively parallel processing (MPP), star and snowflake schemas with conformed dimensions, write-once read-many workload profile, and concurrency tuned for hundreds-to-thousands of analysts running aggregation queries. Schema is defined on write — every column is typed, every relationship is modeled, every grain is conformed. The warehouse is the place where the business definition of "revenue", "customer", and "active user" lives in semantic stone, governed by a finance or data team.
Microsoft mapping: Microsoft Fabric Warehouse, Synapse Dedicated SQL Pool, Azure SQL DB with columnstore indexes for departmental warehousing.
Data Lake
Schema-on-read object storage that holds raw, semi-structured, and unstructured data at any scale.
A data lake is object storage — Parquet, CSV, JSON, Avro, ORC, images, audio, video, telemetry — addressed by file path or container URL. There is no engine intrinsic to a pure lake; the engine is whatever query layer (Spark, Synapse Serverless SQL, Fabric Lakehouse SQL Endpoint, Trino, Presto, Databricks SQL) you point at it. Schema is enforced at read time, which makes the lake cheap to ingest into and expensive to govern. Lakes hold data that you want to keep but do not yet know how to model — clickstream, IoT telemetry, application logs, ML training corpora, raw SaaS exports — and they hold data at petabyte scale at object-storage prices.
Microsoft mapping: Azure Data Lake Storage Gen2 (ADLS Gen2), Microsoft OneLake (the unified Fabric tenant lake), object storage backing both Fabric Lakehouse and Synapse Spark.
Lakehouse
A governed warehouse-like SQL surface layered directly over Delta + Parquet files in the lake.
A lakehouse is the architectural convergence of warehouse and lake — Delta Lake or Apache Iceberg layered over Parquet object storage, giving you ACID transactions, time travel, schema evolution, RLS / CLS, and a SQL query surface that behaves like a warehouse but writes to the lake. The 2026 Microsoft expression is Fabric Lakehouse on OneLake. The lakehouse is genuinely useful — it cuts duplication between lake and warehouse, lets Power BI Direct Lake read V-Order Parquet at near-warehouse latency, and enables Spark, SQL, Python, R, and Copilot to share one storage substrate. But it is not magic — fine-grained governance, point-lookup performance on small dimension tables, and audit-ready lineage for highly regulated workloads remain areas where a properly-modeled warehouse still wins.
Microsoft mapping: Microsoft Fabric Lakehouse on OneLake (Delta Parquet with V-Order), Databricks Lakehouse with Unity Catalog, Synapse Spark + Delta Lake.
The Four Workload Archetypes — When Each Storage Type Wins
The honest decision framework is not "database vs warehouse vs lake." It is "what is the workload archetype, and which Microsoft service profile fits it?" Below are the four archetypes we use in every Data Architecture Assessment. Most Fortune 500 enterprises run at least three of the four in parallel — separated by clear architectural boundaries, connected by Fabric Mirroring, Synapse Link, or Data Factory.
Archetype 1
Transactional (OLTP) — the operational system of record
Capture the business as it happens.
Workload characteristics
Thousands of concurrent short-lived transactions, ACID semantics required, sub-50ms read / write latency, indexed point lookups dominate, working set fits in memory for hot tables, single-row updates are the unit of work. Backed by an ERP, CRM, e-commerce checkout, EHR, banking core, or IoT command-and-control surface.
The Microsoft answer
Azure SQL Managed Instance for SQL Server lift-and-shift with full SQL Agent + cross-database queries; Azure SQL Database for net-new cloud OLTP with serverless and Hyperscale tiers; Azure Database for PostgreSQL Flexible Server for modern open-source workloads, with Hyperscale (Citus) when sharding is required; Azure Cosmos DB for globally distributed, multi-model, single-digit-ms latency at any scale.
Archetype 2
Operational analytics — low-latency reads off the operational store
Answer analytical questions without leaving the transactional surface.
Workload characteristics
Real-time dashboards, in-app analytics, embedded reporting inside the operational application, "show me my customer's last 90 days" panels, A/B test surfaces, fraud scoring at point of authorization. Read-mostly, sub-second latency, must not impact OLTP throughput, schema is the operational schema.
The Microsoft answer
Azure Cosmos DB with the analytical store + Synapse Link for HTAP without ETL; Azure SQL DB with non-clustered columnstore indexes on the operational tables (HTAP within one engine); Azure SQL MI with read-scale-out replicas for offloaded reporting; Fabric Mirroring of Azure SQL DB, Cosmos DB, or Azure SQL MI into OneLake for near-real-time Power BI without source-side load.
Star or snowflake schema with conformed dimensions, hundreds to thousands of concurrent analytical queries, governed metric definitions ("revenue", "customer", "ARR" mean one specific thing), regulator-readable lineage, SCD Type 2 history, certified semantic models for Power BI, finance close and board reporting depend on it.
The Microsoft answer
Microsoft Fabric Warehouse on OneLake — strategic platform for net-new investment, columnstore over Delta Parquet, Direct Lake to Power BI, native Purview lineage; Synapse Dedicated SQL Pool — supported through 2026 for existing high-concurrency MPP investments; Azure SQL DB with columnstore for departmental warehouses under ~1 TB compressed.
Archetype 4
Data lake / lakehouse — multi-modal, ML / AI, raw-through-refined at scale
Keep everything. Model what matters. Train on the rest.
Workload characteristics
Heterogeneous data formats (Parquet, CSV, JSON, Avro, images, audio, video, telemetry), petabyte-class storage economics required, schema is partial or evolves, Spark / Python / Copilot / ML training are first-class workloads alongside SQL, medallion pattern (Bronze → Silver → Gold) curates raw to refined, lineage and governance applied progressively as data matures.
The Microsoft answer
Microsoft Fabric Lakehouse on OneLake — Delta Parquet with V-Order, shortcuts to ADLS Gen2 / S3 / GCS / Dataverse, native Spark 3.5 / Delta 3.2 with the Native Execution Engine, Direct Lake to Power BI, integrated with Copilot in Fabric and AI Skills; Synapse Spark + ADLS Gen2 for existing investments; Databricks Lakehouse for organizations where heavy Spark, Delta Live Tables, or Unity Catalog depth dominate the decision.
Architectural Comparison — Database vs Warehouse vs Lake vs Lakehouse
The fourteen-criterion comparison below is the working architectural reference EPC Group's senior architects use in client whiteboards. It is deliberately direct — the differences between these four surfaces are not subtle, and pretending they are is how enterprise data programs get built around the wrong default.
Criterion
Database (OLTP)
Data Warehouse
Data Lake
Lakehouse
Primary use
Operational transactions, system of record, sub-50ms reads and writes
Modeled analytics, conformed metrics, board and regulator reporting
Raw retention, ML training corpora, semi-structured and unstructured data
Unified analytics across raw and modeled data on one storage substrate
Query type
OLTP — point lookups, short-lived transactions, narrow indexed reads
OLAP — aggregations, scans across billions of rows, star-schema joins
Batch and exploratory — Spark / Python / SQL over files, ML feature pipelines
Mixed — SQL aggregations, Spark transformations, ML, and Direct Lake to BI
Typical latency
< 50 ms for indexed reads / writes
Seconds for aggregations over billions of rows; sub-second on hot aggregates
Seconds to minutes — engine-dependent; no intrinsic latency guarantee
Sub-second on Direct Lake reads, seconds to minutes on Spark transforms
Data structure
Structured (relational rows) or document (Cosmos DB)
Structured, modeled — star / snowflake, conformed dimensions
Structured + semi-structured + unstructured — any file type
Structured + semi-structured — Delta Parquet with optional unstructured shortcuts
Schema enforcement
Schema-on-write — enforced at insert time
Schema-on-write — enforced at load, governed by data team
Schema-on-read — enforced (if at all) at query time
Schema-on-write via Delta with evolution support; lake-style shortcuts read-on-schema
ACID transactions
Yes — full ACID, single-row and multi-statement
Yes — ACID transactions on Fabric Warehouse and Synapse Dedicated SQL Pool
No — eventually consistent object storage, file-level only
Yes — Delta Lake / Iceberg provide ACID over Parquet files
Scale unit
vCore / DTU / RU per second; vertical scale with Hyperscale or horizontal shard (Citus, Cosmos)
Fabric Capacity Units (CUs) for Fabric Warehouse; DWUs for Synapse Dedicated SQL Pool
Comparison reflects Microsoft data platform state as of June 2026. Sources: Microsoft Learn documentation, Microsoft Fabric public roadmap, and EPC Group production deployments across 500+ Fabric platforms, 1,500+ Power BI implementations, and a comparable Azure SQL and Cosmos DB customer base.
The Lakehouse — The Convergence Story (and Its Honest Pitfalls)
The lakehouse is the most consequential architectural shift in enterprise data platforms since the cloud data warehouse emerged a decade ago. The mechanics are simple: Delta Lake (or Apache Iceberg) layers a transaction log over Parquet files in object storage. The transaction log gives you ACID — atomic commits, isolation across readers and writers, durability, time-travel queries. Parquet gives you columnar compression and predicate pushdown. Object storage gives you petabyte economics and decoupled compute. A SQL engine on top (Fabric Lakehouse SQL endpoint, Databricks SQL warehouse, Synapse Serverless SQL pool) gives you warehouse-like query semantics. The result is a single storage substrate that serves SQL, Spark, Python, R, ML, and Copilot — instead of the legacy pattern of moving data between a lake and a warehouse twice a day.
For Microsoft-anchored enterprises, the 2026 expression of this pattern is Microsoft Fabric Lakehouse on OneLake. OneLake is the tenant-wide unified lake — one logical store across all Fabric workspaces, with V-Order Parquet that lets Power BI Direct Lake read at Import-mode performance with zero refresh, and shortcuts that virtualize ADLS Gen2, Amazon S3, Google Cloud Storage, and Dataverse in place. Lakehouse, Warehouse, Notebook, Real-Time Intelligence, and Power BI are first-class items inside the same workspace identity model. The architectural advantage is real, and for many net-new platforms the lakehouse is the right default.
Where the lakehouse genuinely wins
✓Multi-modal workloads — SQL + Spark + Python + ML + Copilot — on one storage substrate
✓Object-storage economics for raw and historical data at petabyte scale
✓Direct Lake to Power BI eliminates the refresh-vs-DirectQuery tradeoff for Fabric customers
✓Schema evolution support via Delta — additive changes propagate without breaking consumers
✓Time travel and audit history at the Delta transaction-log level — useful for forensic and regulatory review
Honest pitfalls to plan around
!Governance maturity gaps — Delta + Purview lineage is improving quarterly, but regulator-readable end-to-end lineage on lakehouses is not yet at the level of a properly-modeled warehouse
!Fine-grained performance on small dimension tables can be slower than a tuned columnstore + b-tree warehouse for sub-second BI workloads
!Without strict medallion discipline (Bronze / Silver / Gold), a lakehouse becomes a swamp inside six months
!Fabric capacity is shared across the workspace — a runaway Spark job can throttle a board report unless capacity is isolated by domain
!Highly regulated workloads (FedRAMP High, HIPAA, GxP) require a senior-architect-led governance design from day one — not retrofitted after launch
Six Real Scenarios — Mapped to the Microsoft Answer
The six scenarios below are the most common "which data service do we use" questions EPC Group receives in CIO and CDO engagements. For each, we name the Microsoft target architecture and the honest reasoning behind the choice.
Scenario 1
ERP transactional layer — finance, HR, supply chain backing a Dynamics 365 or SAP S/4HANA on Azure deployment
Recommended Microsoft architecture
Azure SQL Managed Instance or Azure Database for PostgreSQL Hyperscale
OLTP workload, ACID requirements, full SQL Server compatibility for D365 F&O extensions or SAP DB layer, transparent migration from on-prem SQL Server with SQL Agent + cross-database queries preserved. PostgreSQL Hyperscale (Citus) is the right answer when sharding by customer or tenant is intrinsic to the data model. Push analytical workloads off through Fabric Mirroring rather than running BI directly against the OLTP store.
Scenario 2
Global product catalog or session store with sub-50ms reads across five continents
Recommended Microsoft architecture
Azure Cosmos DB (multi-region, multi-write)
Cosmos DB is the only Azure data service that offers single-digit-millisecond P99 reads and writes across multiple regions with five well-defined consistency levels. The cost is operational complexity around partition key design, RU/s sizing, and consistency model choice — but for global edge workloads, nothing else in the Microsoft estate matches it. Pair with Synapse Link or Fabric Mirroring to stream operational data into OneLake without affecting the transactional surface.
Scenario 3
Enterprise BI on modeled facts and dimensions serving 500 analysts and the board
Recommended Microsoft architecture
Microsoft Fabric Warehouse + OneLake + Power BI Direct Lake
Fabric Warehouse is the 2026 strategic Microsoft analytical platform. Columnstore over Delta Parquet, native ACID, conformed-dimension friendly, certified semantic models on Direct Lake deliver Import-mode performance with zero refresh. For high-concurrency board and regulator reporting where one definition of "revenue" matters, this is the right place to enforce it. Cross-reference our full Microsoft Fabric expertise hub for governance, capacity sizing, and Power BI patterns.
Data science, ML feature engineering, and IoT telemetry at petabyte scale
Recommended Microsoft architecture
Microsoft Fabric Lakehouse on OneLake + Fabric Notebooks (Spark 3.5)
Lakehouse is the right architecture for multi-modal, schema-evolving, ML-heavy workloads. Delta Parquet gives you ACID and time travel without leaving object storage economics. Fabric Notebooks deliver Spark 3.5, Python, R, and Copilot in one workspace. OneLake shortcuts can virtualize existing ADLS Gen2 lakes in place, so the migration to Fabric is incremental rather than wholesale.
Scenario 5
Healthcare analytics across EHR, claims, and clinical operations under HIPAA
Healthcare is exactly the workload where lake economics and warehouse governance must coexist. Fabric Lakehouse holds raw HL7 / FHIR / claims data in Bronze, conformed encounter and patient tables in Silver, certified analytical marts in Gold — all governed by Purview sensitivity labels propagated from source to Power BI report. HIPAA-eligible service coverage, customer-managed keys via Azure Key Vault, private endpoints, and U.S.-region-pinned OneLake satisfy the regulatory baseline.
Federal mission analytics in Azure Government under FedRAMP High and CMMC 2.0
Recommended Microsoft architecture
Azure Government + Synapse (today) → Fabric (as regional GA matures)
In Azure Government Cloud, Synapse remains the production answer for FedRAMP High and CMMC 2.0 Level 2 analytics today because Fabric capacity SKU GA and feature parity in Government regions is staged. For greenfield workloads in commercial Azure with federal partner-facing requirements, Fabric is GA and appropriate. Plan a quarterly review against the Microsoft Fabric Government Cloud roadmap and migrate when GA parity supports your workload.
Across 216+ M&A tenant consolidations and 11,000+ Microsoft engagements over 29 years, the same five data architecture mistakes recur. None of them are technical exotica — they are predictable failure modes that show up when teams pick a storage default without mapping it to a workload archetype.
1
Running BI directly against the OLTP database
The single most common operational analytics failure mode we see in 216+ tenant consolidations. A finance team builds Power BI reports against the ERP database in DirectQuery mode, joins five large tables at report load, and starves the OLTP workload during month-end close. The fix is Fabric Mirroring or Synapse Link from Cosmos DB / Azure SQL DB to OneLake — operational data flows to the analytical surface near-real-time, with zero source-side load. The OLTP database goes back to doing what it is designed for.
2
Treating the data lake as the warehouse
Teams land everything in ADLS Gen2, write Spark notebooks to transform it, and try to serve hundreds of Power BI analysts through Synapse Serverless SQL pool DirectQuery. Concurrency collapses, costs spike, and lineage is incomplete. The fix is a modeled layer — either a Fabric Warehouse with conformed dimensions or a Fabric Lakehouse Gold zone with certified semantic models on Direct Lake. The lake holds raw and curated; the warehouse / Gold zone holds governed metrics.
3
Skipping the medallion (Bronze / Silver / Gold) discipline in the lakehouse
Without a medallion architecture, the lakehouse becomes a swamp inside six months. Raw ingestion mixes with curated tables, schema evolution is uncoordinated, lineage breaks, and the Power BI team can no longer tell which tables are board-grade. The fix is explicit Bronze (raw, immutable, schema-on-read), Silver (cleansed, conformed, schema-on-write Delta), Gold (modeled facts and dimensions certified for BI), with workspace permissions and Purview classifications enforced at the zone boundary.
4
Choosing Cosmos DB when Azure SQL would have worked
Cosmos DB is exceptional for globally distributed, multi-model, single-digit-ms workloads — and operationally complex for everyone else. Teams adopt Cosmos because "NoSQL scales", then spend the next 18 months wrestling partition key design, RU/s capacity, and cross-partition query cost. For most enterprise workloads that fit comfortably inside a single Azure region with relational semantics, Azure SQL DB (or Azure SQL MI) is the lower-TCO answer. We have rewritten Cosmos workloads back onto Azure SQL three times in the last 24 months.
5
Forgetting that Fabric capacity is shared across the workspace
Synapse made sizing easy — DWU per pool, vCore-hour per Spark pool, per-pipeline activity cost — so chargeback was a fixed-cost calculation per workload. Fabric capacity is shared across Lakehouse, Warehouse, Data Factory, Spark, Real-Time Intelligence, and Power BI Premium. A poorly-tuned Spark job in one workspace can throttle a board report in another. The fix is FinOps tagging at the workspace level, capacity smoothing analysis, and discrete Fabric capacities per critical workload domain — not a single tenant-wide F-SKU.
Governance — Same Controls, Different Mechanisms
For regulated enterprises, the governance baseline is non-negotiable. The encouraging 2026 reality is that the same Microsoft governance surface — Entra identity, Purview catalog and lineage, sensitivity labels, RLS / CLS, customer-managed keys, private endpoints — applies across database, warehouse, lake, and lakehouse. The mechanisms differ, the controls do not. Below is the six-control matrix EPC Group uses for HIPAA, SOC 2, FedRAMP High, FINRA, CMMC 2.0, and GxP engagements.
Control
Database
Warehouse
Lake
Lakehouse
Row-level security (RLS)
Native — SQL Server security predicates; Cosmos DB via partition-key + RBAC
Native — Fabric Warehouse RLS; Synapse Dedicated SQL Pool RLS
Not native — enforced at query engine (Synapse Serverless SQL, Spark)
The framework above is not theoretical. It is the working model behind every Microsoft data platform EPC Group has shipped — across 70+ Fortune 500 clients, 216+ M&A tenant consolidations, 1.83 million users migrated, and a 29-year operating history on the Microsoft cloud since 1997.
11,000+
Microsoft engagements over 29 years
1,500+
Power BI implementations
500+
Microsoft Fabric platforms
216+
M&A tenant consolidations
Microsoft Press authorship on Power BI and Microsoft data platforms
Founder and CEO Errin O'Connor is a four-time Microsoft Press bestselling author — Power BI, SharePoint, Azure, and large-scale enterprise migrations — the same architectural patterns that inform every Fabric, Synapse, and Power BI engagement we deliver. Power BI expertise hub →
All six current Microsoft Solutions Partner Designations
Data & AI, Modern Work, Infrastructure, Security, Digital & App Innovation, Business Applications — full coverage of the Microsoft cloud stack with no subcontracting. The Data & AI Designation is the named credential behind every Fabric, Lakehouse, and Cosmos engagement we deliver.
The EPC Group Lifecycle — Assess → Modernize → Govern → Operate → Enable
The named, single-accountable-partner delivery model that lets the same senior architects own a data platform engagement from board roadmap through year-two managed operations. No phase-to-phase team rotation. See the full Lifecycle →
Database, warehouse, lake, and lakehouse engagements delivered with documented control mapping to the named regulatory baseline. Zero audit findings across the regulated industries EPC Group serves. U.S.-citizen-only delivery teams available for federal and CMMC 2.0 engagements.
Three Engagement Models for Data Architecture Modernization
EPC Group structures Microsoft data architecture engagements three ways. Most customers enter through a Data Architecture Assessment, then move into a Modernization Accelerator if the diagnosis supports it, then into Managed Fabric + Lakehouse for steady-state operations. Assessment and Accelerator pricing is fixed-fee — known before delivery starts.
2-week fixed fee
Data Architecture Assessment
Senior-architect-led discovery of the existing data estate — every OLTP database, every analytical workload, every lake or warehouse, every Power BI tenant. We map workload-by-workload onto the four archetypes, identify mismatches (BI-on-OLTP, lake-as-warehouse, Cosmos-when-SQL), and deliver a fixed-fee Microsoft target architecture with sized Fabric capacity, identified migration sequence, and a 12-month investment plan.
Best for: CIOs and CDOs who inherited a sprawling data estate (Azure SQL + Synapse + ADLS Gen2 + Cosmos + on-prem SQL Server + Databricks + Snowflake) and need an honest read before committing to a modernization program.
90-day fixed fee
Modernization Accelerator
Five-phase modernization delivery — Assess (workload inventory + target mapping), Sizing (Fabric capacity, Cosmos RU/s, Azure SQL service tier), Mirror (OneLake shortcuts and Fabric Mirroring of operational sources), Refactor (lakehouse medallion, Direct Lake semantic models, Pipelines on Fabric Data Factory), and Cutover (parallel-run validation, lineage handoff, runbook). Compliance-native delivery aligned to your regulatory baseline — HIPAA, SOC 2, FedRAMP High, FINRA, CMMC 2.0, GxP.
Best for: Organizations ready to retire a legacy data warehouse, consolidate a sprawling lake, or migrate Synapse → Fabric inside a calendar quarter, with a single fixed-fee program and a senior architect as named owner.
Monthly retainer
Managed Fabric + Lakehouse
24/7 co-managed operations across the Fabric estate — capacity smoothing analysis, Lakehouse medallion stewardship, Direct Lake semantic model maintenance, Cosmos DB / Azure SQL operational oversight, Purview lineage health, Entra access reviews, named senior architect on retainer, and monthly executive reporting that ties data platform spend to business outcomes.
Best for: Production-scale Fabric and lakehouse customers who need senior-architect bench depth without hiring it full-time, with single-vendor accountability across the full Microsoft data stack.
Database, Warehouse, Lake, and Lakehouse Under the EPC Group Lifecycle
Every Microsoft data architecture engagement EPC Group delivers runs under the EPC Group Lifecycle — Assess, Modernize, Govern, Operate, Enable — so the same senior architects move with the platform from roadmap through year-two managed operations. One contract. One escalation path. One named owner for the data platform.
What is the difference between OLTP and OLAP, and how do I know which I need?
OLTP (online transactional processing) is the workload profile of an operational database — thousands of short-lived, highly concurrent transactions that read and write the current state of the business. Sub-50ms latency, full ACID semantics, narrow indexed lookups, and row-by-row updates dominate. OLAP (online analytical processing) is the workload profile of a data warehouse or lakehouse — long-running aggregations across billions of rows of history, sorting on conformed dimensions, with hundreds of concurrent analysts but no individual-row updates. The two profiles fight each other inside a single engine: running heavy aggregations against an OLTP database starves the transactional workload, and running thousands of point lookups against an OLAP warehouse wastes its columnstore advantage. The rule is simple: capture in an OLTP system (Azure SQL DB, Cosmos DB, Azure SQL MI, PostgreSQL Hyperscale); report and analyze in an OLAP system (Fabric Warehouse, Fabric Lakehouse, Synapse Dedicated SQL Pool); move data between them with Fabric Mirroring, Synapse Link, or Data Factory. Most enterprises need both, separated by clear architectural boundaries.
When does Cosmos DB beat Azure SQL Database, and when is it the wrong choice?
Cosmos DB beats Azure SQL Database when the workload is globally distributed (more than one Azure region serving live traffic), requires single-digit-millisecond P99 latency across continents, is multi-model (document, graph, key-value, or column-family in one engine), or scales beyond the practical envelope of a relational engine (think gaming back-ends, ad-tech bidding, IoT command-and-control, or session stores at hundreds of thousands of operations per second). Cosmos is also the right answer when the data model is genuinely schema-flexible — heterogeneous documents that do not fit comfortably into a relational schema. Cosmos is the wrong choice when the workload fits inside a single Azure region with relational semantics, when SQL Server compatibility (cross-database queries, SQL Agent, SSIS) matters, when transactional history and complex joins dominate, or when the team has no operational experience tuning partition keys and RU/s. We have rewritten Cosmos workloads back onto Azure SQL three times in the last 24 months because the operational complexity outweighed the scale benefit. The honest 2026 default for new enterprise OLTP is Azure SQL DB or Azure SQL MI; Cosmos is the right choice when one of the four conditions above is genuinely true.
What is the difference between a lakehouse and a data warehouse — and which do I need in 2026?
A data warehouse (Fabric Warehouse, Synapse Dedicated SQL Pool, traditional SQL Server with columnstore) is a modeled analytical engine — schema-on-write, conformed dimensions, governed metric definitions, ACID transactions on relational tables, optimized for high-concurrency aggregation. A lakehouse (Fabric Lakehouse on OneLake, Databricks Lakehouse with Unity Catalog) is a SQL surface layered over Delta Lake (Parquet + transaction log) — schema is enforced by Delta with evolution support, ACID transactions are guaranteed by the Delta protocol, multi-modal data (Spark, SQL, Python, R, ML) shares one storage substrate. The 2026 honest read: a lakehouse can do most of what a warehouse does, plus ML and unstructured data, at object-storage cost — which is why Microsoft is pushing Fabric Lakehouse as the architectural default for net-new platforms. A modeled warehouse still wins for highly regulated workloads where regulator-readable lineage and fine-grained metric governance dominate, for departmental BI workloads under ~1 TB where the lakehouse adds operational complexity without benefit, and for sub-second point-lookup workloads on small dimension tables where the warehouse columnstore + b-tree pattern outperforms Delta Parquet. Most Fortune 500 platforms we build today run both: Lakehouse for raw-through-curated and ML, Warehouse for board-grade conformed marts, with both feeding Power BI through Direct Lake.
Can a Fabric Lakehouse really replace my enterprise data warehouse?
For most net-new enterprise data platforms in 2026 — yes. Fabric Lakehouse on OneLake offers ACID transactions via Delta Lake, RLS / CLS on the SQL endpoint, V-Order Parquet that lets Power BI Direct Lake read at Import-mode performance with zero refresh, native Purview lineage end-to-end, and the same Entra identity surface that governs the rest of the Microsoft estate. We have built lakehouses that retired traditional warehouses for healthcare, manufacturing, and professional services Fortune 500 clients. But the honest qualifier: lakehouse replacement of warehouse is not automatic. It requires (1) a disciplined medallion architecture (Bronze / Silver / Gold) with workspace boundaries, (2) certified semantic models on the Gold zone for board reporting, (3) Fabric capacity sized to your concurrency curve — sustained high-concurrency analytical workloads can require larger F-SKUs than the equivalent Synapse Dedicated SQL Pool DWU sizing, (4) Purview governance configured from day one — not retrofitted six months in, (5) clear cost-tagging because Fabric capacity is shared across workloads. Where these conditions are not met, the Warehouse + Lakehouse hybrid pattern is the lower-risk answer for another 12–24 months. EPC Group always runs a Fabric Unit of Capacity (FUC) sizing exercise before recommending lakehouse-only architectures for high-concurrency workloads.
What is the difference between OneLake and ADLS Gen2?
ADLS Gen2 (Azure Data Lake Storage Gen2) is the hierarchical-namespace object storage layer that has powered Azure data platforms for years — POSIX-style ACLs, container + folder + file hierarchy, addressable by storage account URL, priced per GB stored. It is generic, mature, and tied to Azure storage account boundaries. OneLake is the Fabric-tenant-wide unified data lake — a single logical lake that spans all Fabric workspaces in your Microsoft tenant, with workspace and item permissions enforced by Microsoft Entra, with shortcuts that can virtualize external storage (ADLS Gen2, Amazon S3, Google Cloud Storage, Dataverse) without data copy, and with built-in support for Delta Parquet with V-Order optimization for Power BI Direct Lake. OneLake is not a replacement for ADLS Gen2 — under the hood, OneLake storage is itself built on Azure Storage. The practical difference for an enterprise: with ADLS Gen2 you manage storage accounts, containers, and ACLs as primary objects; with OneLake you manage Fabric workspaces, items, and shortcuts as primary objects, and storage management becomes implicit. For existing ADLS Gen2 lakes, OneLake shortcuts allow you to expose them inside Fabric without migration. For net-new platforms in 2026, OneLake is the right starting point.
When do I use Power BI Import vs DirectQuery vs Direct Lake?
Import is the original Power BI consumption mode — data is loaded into the VertiPaq columnar in-memory engine on a scheduled refresh (typically 1×/day to 8×/day on Premium). Best performance, deterministic latency, full DAX coverage, and decoupled from source system load — but data is only as fresh as the last refresh, and very large semantic models can exceed Premium capacity memory limits. DirectQuery sends every Power BI report query as a live SQL query to the underlying source (Azure SQL, Synapse, Fabric Warehouse). Always-current data, no semantic model size limit, but performance depends on source throughput and source-side query load can become a bottleneck. Direct Lake is the 2026 Fabric-only mode that reads V-Order Parquet directly from OneLake into the VertiPaq engine — Import-mode performance with zero refresh, governed by Fabric capacity, requires Fabric Lakehouse or Fabric Warehouse as the source. The 2026 default for Fabric-anchored workloads is Direct Lake. Import remains correct when the source is non-Fabric (Azure SQL DB, Snowflake, Salesforce) and refresh latency is acceptable, or when complex M-language transformations are required. DirectQuery is correct when freshness must be real-time and the source can absorb the query load — generally a niche use case in 2026.
Is Fabric Warehouse the same as Synapse Dedicated SQL Pool — and should I migrate?
They are architecturally distinct. Synapse Dedicated SQL Pool is the renamed Azure SQL Data Warehouse — a massively parallel processing (MPP) SQL warehouse with provisioned compute measured in Data Warehouse Units (DWU), running on the original Polybase-derived MPP engine. Fabric Warehouse is the 2026 Microsoft analytical warehouse, built on the same Polaris distributed query engine but storing data as Delta Parquet on OneLake (not in a proprietary internal format), priced through unified Fabric capacity (not DWU-hour), and natively integrated with Direct Lake to Power BI. Functionally, Fabric Warehouse delivers T-SQL parity for most workloads, with some Synapse-specific features (resource classes, MPP-style workload management isolation, certain hint syntaxes) absent or different. Microsoft has not announced an EOL for Synapse Dedicated SQL Pool as of June 2026, but net-new investment is concentrated on Fabric. Migration triggers: Power BI Direct Lake is the strongest, OneLake mirroring of operational sources is second, Copilot in Fabric is third. Stay on Synapse Dedicated SQL Pool when the workload is tightly tuned at high concurrency, the chargeback model is built on DWU sizing, or sovereign-region Fabric GA is not yet available. We cover the full Synapse → Fabric migration framework in our Azure Synapse Analytics Enterprise Guide.
How do data lakes fail without governance — and how do we avoid the swamp?
Ungoverned data lakes fail in five recognizable ways. First, ingestion outpaces curation — terabytes land in raw zones and never get modeled, lineage breaks, and within a year nobody can tell which datasets are trustworthy. Second, schema evolves silently — a source system changes a field type, downstream Spark jobs and SQL queries break in unpredictable ways, and incident response becomes archaeology. Third, security is by obscurity — POSIX ACLs are applied inconsistently, sensitive data appears in the wrong containers, and HIPAA / PII exposure becomes an audit finding. Fourth, cost discipline collapses — pay-per-TB-scanned queries run repeatedly against poorly partitioned CSV files, costs spike, and FinOps cannot attribute spend. Fifth, BI consumers route around the lake — they extract data they need into Excel or rogue Power BI tenants, and the lake becomes a write-only system. The avoidance pattern is explicit and non-optional: medallion architecture (Bronze / Silver / Gold) with workspace boundaries, Delta Parquet for everything beyond Bronze, Purview catalog and sensitivity labels applied at ingestion, certified semantic models for BI consumption, Fabric capacity discipline, and senior-architect-led governance with quarterly steward reviews. EPC Group delivers this under our data governance engagement model.
A 60-minute call with a senior architect — not a sales lead. We will give you an honest read on which Microsoft storage layer fits each of your workloads (database, warehouse, lake, lakehouse), what your modernization sequence looks like across a 12-month horizon, and where Fabric, Cosmos DB, Azure SQL, and Synapse each belong in your target architecture. If a non-Microsoft platform is the right answer for a specific workload, we will say so on the call.