EPC Group - Enterprise Microsoft AI, SharePoint, Power BI, and Azure Consulting
G2 High Performer Summer 2025, Momentum Leader Spring 2025, Leader Winter 2025, Leader Spring 2026
BlogContact
Ready to transform your Microsoft environment?Get started today
(888) 381-9725Get Free Consultation
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌

EPC Group

Enterprise Microsoft consulting with 29 years serving Fortune 500 companies.

(888) 381-9725
contact@epcgroup.net
4900 Woodway Drive, Suite 830
Houston, TX 77056

Follow Us

Solutions

  • M&A Practices

    • M&A Tenant Migration
    • Carve-Out Migration
    • Private Equity Practice
    • Engagement Operating Model
  • All Services
  • Microsoft 365 Consulting
  • AI Governance
  • Azure AI Consulting
  • Cloud Migration
  • Microsoft Copilot
  • Data Governance
  • Microsoft Fabric
  • Dynamics 365
  • Power BI Consulting
  • SharePoint Consulting
  • Microsoft Teams
  • vCIO / vCAIO Services
  • Large-Scale Migrations
  • SharePoint Development

Industries

  • All Industries
  • Healthcare IT
  • Financial Services
  • Government
  • Education
  • Teams vs Slack

Power BI

  • Case Studies
  • 24/7 Emergency Support
  • Dashboard Guide
  • Gateway Setup
  • Premium Features
  • Lookup Functions
  • Power Pivot vs BI
  • Treemaps Guide
  • Dataverse
  • Power BI Consulting

Company

  • About Us
  • Our History
  • Microsoft Gold Partner
  • Case Studies
  • Testimonials
  • Fixed-Fee Accelerators
  • Blog
  • Resources
  • All Guides & Articles
  • Video Library
  • Client Reviews
  • Engagement Operating Model
  • FAQ
  • Contact
  • Schedule a consultation

Microsoft Teams

  • Teams Questions
  • Teams Healthcare
  • Task Management
  • PSTN Calling
  • Enable Dial Pad

Azure & SharePoint

  • Azure Databricks
  • Azure DevOps
  • Azure Synapse
  • SharePoint MySites
  • SharePoint ECM
  • SharePoint vs M-Files

Comparisons

  • M365 vs Google
  • Databricks vs Dataproc
  • Dynamics vs SAP
  • Intune vs SCCM
  • Power BI vs MicroStrategy

Legal

  • Sitemap
  • Privacy Policy
  • Terms
  • Cookies

About EPC Group

EPC Group is a Microsoft consulting firm founded in 1997 (originally Enterprise Project Consulting, renamed EPC Group in 2005). 29 years of enterprise Microsoft consulting experience. EPC Group historically held the distinction of being the oldest continuous Microsoft Gold Partner in North America from 2016 until the program's retirement. Because Microsoft officially deprecated the Gold/Silver tiering framework, EPC Group transitioned to the modern Microsoft Solutions Partner ecosystem and currently holds the core Microsoft Solutions Partner designations.

Headquartered at 4900 Woodway Drive, Suite 830, Houston, TX 77056. Public clients include NASA, FBI, Federal Reserve, Pentagon, United Airlines, PepsiCo, Nike, and Northrop Grumman. 6,500+ SharePoint implementations, 1,500+ Power BI deployments, 500+ Microsoft Fabric implementations, 70+ Fortune 500 organizations served, 11,000+ enterprise engagements, 200+ Microsoft Power BI and Microsoft 365 consultants on staff.

About Errin O'Connor

Errin O'Connor is the Founder, CEO, and Chief AI Architect of EPC Group. Microsoft MVP multiple years, first awarded 2003. 4× Microsoft Press bestselling author of Windows SharePoint Services 3.0 Inside Out (MS Press 2007), Microsoft SharePoint Foundation 2010 Inside Out (MS Press 2011), SharePoint 2013 Field Guide (Sams/Pearson 2014), and Microsoft Power BI Dashboards Step by Step (MS Press 2018).

Original SharePoint Beta Team member (Project Tahoe). Original Power BI Beta Team member (Project Crescent). FedRAMP framework contributor. Worked with U.S. CIO Vivek Kundra on the Obama administration's 25-Point Plan to reform federal IT, and with NASA CIO Chris Kemp as Lead Architect on the NASA Nebula Cloud project. Speaker at Microsoft Ignite, SharePoint Conference, KMWorld, and DATAVERSITY.

© 2026 EPC Group. All rights reserved. Microsoft, SharePoint, Power BI, Azure, Microsoft 365, Microsoft Copilot, Microsoft Fabric, and Microsoft Dynamics 365 are trademarks of the Microsoft group of companies.

‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌

Last updated: 2026 · Read time: ~10 min

Key Facts

  • Microsoft Fabric combines six products: Data Engineering (Synapse Spark), Data Warehouse (Synapse SQL), Data Science (notebooks + MLflow), Real-Time Intelligence, Data Factory (orchestration), and Power BI.
  • OneLake is the storage foundation — one copy of data serves all Fabric workloads. No ETL between components.
  • Direct Lake mode: Power BI reads directly from OneLake Delta Parquet files. Import-mode performance without imported data storage cost.
  • Fabric Pipelines vs Azure Data Factory: Fabric Pipelines have native Fabric integration, OneLake as default storage, capacity-based billing (Fabric CUs), and no self-hosted integration runtime management for cloud sources.
  • Medallion architecture: Bronze (raw) → Silver (cleansed) → Gold (business-ready). All three layers stored in OneLake as Delta tables.
HomeBlogMicrosoft Fabric
Fabric Data Engineering: Lakehouse Guide 2026 - EPC Group enterprise consulting

Fabric Data Engineering: Lakehouse Guide 2026

Microsoft Fabric data engineering guide: lakehouse architecture, data pipelines, Spark notebooks, OneLake, and medallion architecture.

Back to BlogMicrosoft Fabric

Fabric Data Engineering: Lakehouse Guide 2026

Expert Insight from Errin O'Connor

29 years Microsoft consulting | 4x Microsoft Press bestselling author | Former NASA Lead Architect | 50+ enterprise Microsoft Fabric implementations with lakehouse architectures processing PB-scale data

EO
Errin O'Connor
CEO & Chief AI Architect
•
February 23, 2026
•
26 min read

Quick Answer

Microsoft Fabric unifies data engineering, data science, data warehousing, and business intelligence on a single platform built on OneLake, a shared data lake storing everything in open Delta Lake format.

The enterprise data engineering architecture uses the medallion pattern: Bronze lakehouses ingest raw data via data pipelines and Dataflows Gen2, Silver lakehouses apply business rules through Spark notebooks, and Gold lakehouses provide business-ready dimensional models consumed by Power BI Direct Lake datasets for zero-copy analytics.

Organizations migrating from Azure Synapse to Fabric achieve 30-40% TCO reduction while eliminating data silos, and OneLake shortcuts enable federation with existing Azure, AWS, and GCP data lakes without data movement.

Table of Contents

1. Microsoft Fabric Platform Overview2. OneLake: The Unified Data Foundation3. Fabric Lakehouse Architecture4. Spark Notebooks for Data Engineering5. Data Pipelines and Orchestration6. Medallion Architecture Implementation7. Fabric Data Warehouse vs. Lakehouse8. Real-Time Analytics9. Implementation Roadmap10. Frequently Asked Questions

Microsoft Fabric Data Engineering Guide 2026

Last updated: 2026 · Read time: ~10 min

Microsoft Fabric data engineering covers lakehouses, data pipelines, Spark notebooks, OneLake, and the medallion architecture. This guide explains how Fabric unifies six previously separate Azure services (Synapse, Data Factory, Power BI, and more) into a single SaaS experience. Includes Fabric Pipelines vs ADF comparison, medallion architecture patterns, and Direct Lake mode mechanics.

Key facts

  • Microsoft Fabric combines six products: Data Engineering (Synapse Spark), Data Warehouse (Synapse SQL), Data Science (notebooks + MLflow), Real-Time Intelligence, Data Factory (orchestration), and Power BI.
  • OneLake is the storage foundation — one copy of data serves all Fabric workloads. No ETL between components.
  • Direct Lake mode: Power BI reads directly from OneLake Delta Parquet files. Import-mode performance without imported data storage cost.
  • Fabric Pipelines vs Azure Data Factory: Fabric Pipelines have native Fabric integration, OneLake as default storage, capacity-based billing (Fabric CUs), and no self-hosted integration runtime management for cloud sources.
  • Medallion architecture: Bronze (raw) → Silver (cleansed) → Gold (business-ready). All three layers stored in OneLake as Delta tables.

What Microsoft Fabric is

Microsoft Fabric is a SaaS analytics platform that combines six previously separate Azure services into one experience. All workloads share the same data lake (OneLake), the same governance layer (Microsoft Purview), and the same billing meter (Fabric Capacity Units).

The six products in one platform:

  • Data Engineering — Synapse Spark: notebooks, pipelines, Delta Lake.
  • Data Warehouse — Synapse SQL: T-SQL warehouse with auto-scaling.
  • Data Science — notebooks, MLflow experiment tracking, model registry.
  • Real-Time Intelligence — formerly Synapse Real-Time Analytics + Data Activator. Streaming analytics and event-driven actions.
  • Data Factory — pipeline orchestration. Copy activities, Dataflows Gen2, scheduled and triggered pipelines.
  • Power BI — semantic models, reports, dashboards, Direct Lake mode.

A data engineer writes to a lakehouse table. A Power BI analyst queries it immediately via Direct Lake — no ETL, no data copy, no permission reconfiguration.

OneLake architecture

OneLake is the single storage foundation. Every Fabric tenant gets one OneLake — automatically provisioned, no separate storage account setup required.

Key architectural characteristics:

  • Hierarchical organization: tenants → workspaces → lakehouses → folders.
  • Native Delta Lake (Parquet) format for all structured and semi-structured data.
  • One copy of data serves all workloads — no ETL between Data Engineering, Data Warehouse, Data Science, and Power BI.
  • ADLS Gen2 API compatibility — existing tools and scripts work without modification.
  • Built-in governance via Microsoft Purview integration — sensitivity labels, lineage, data catalog.

Medallion architecture in Fabric

The medallion architecture organizes OneLake data into three quality layers:

  • Bronze layer — raw ingested data. No transformation. Source-faithful. Data lands here from pipelines, streaming, or file upload.
  • Silver layer — cleansed and conformed data. Null handling, deduplication, type casting, schema standardization applied. This is the analytical baseline.
  • Gold layer — business-ready data. Aggregations, business logic, and dimensional models applied. Direct Lake semantic models are built on Gold layer tables.

All three layers live in OneLake as Delta tables. Power BI Direct Lake connects to Gold layer tables without importing data.

Fabric Pipelines vs Azure Data Factory

Fabric Pipelines and Azure Data Factory share the same pipeline canvas and activity types. They are different in five key ways:

  1. Native Fabric integration — Fabric Pipelines natively orchestrate Fabric activities: Spark notebooks, Dataflows Gen2, SQL scripts, and semantic model refreshes. No linked services or external connections required.
  2. OneLake as default storage — Copy activities write directly to OneLake lakehouse tables. No storage linked service configuration required.
  3. Simplified monitoring — Pipeline runs are visible in the Fabric Monitoring Hub alongside all other Fabric workload executions in a single pane.
  4. Capacity-based billing — Pipeline execution consumes Fabric Capacity Units (CUs) rather than separate ADF billing meters. Simpler cost management.
  5. No self-hosted integration runtime management for cloud sources — Fabric handles connectivity to Azure, Microsoft 365, and 150+ cloud data sources natively. Self-hosted IR is only needed for on-premises sources.

Spark notebooks in Fabric

Fabric notebooks run on Spark. They are the primary data engineering tool for complex transformations, ML feature engineering, and large-scale data processing.

  • Native Python, Scala, and R kernel support.
  • Native Delta Lake read/write without Spark configuration — Delta is the default table format in Fabric.
  • Native OneLake access — no ADLS connection string or storage account key needed.
  • MLflow integration — experiment tracking, model logging, and model registry built into the Fabric Data Science experience.
  • Notebook scheduling via Fabric Pipelines — notebooks run as pipeline activities with input/output parameter passing.

Fabric vs Azure Synapse Analytics

Azure Synapse Analytics was an integration of separate Azure services — each with independent storage and billing. Fabric provides a unified experience where all workloads share the same data lake, security model, governance layer, and billing meter.

The four "no more" differences for Synapse-to-Fabric migrations:

  • No more ETL pipelines between the data lake and data warehouse — all workloads read from the same OneLake.
  • No more data synchronization issues — data written by Spark is immediately queryable by SQL, Power BI, and Data Science workloads.
  • No more security model inconsistencies between different storage systems.
  • No more duplicate storage costs for the same data in multiple formats.

Direct Lake mode

Direct Lake is the Power BI connection mode exclusive to Microsoft Fabric. It reads Delta Parquet files directly from OneLake — without importing data into the semantic model.

  • Import-mode query performance — column-store indexing provides sub-second response on large tables.
  • No scheduled refresh — data in OneLake is always current. Power BI reads the latest snapshot automatically.
  • No storage overhead — semantic model does not store a copy of the data. OneLake is the single source of truth.
  • Available from Fabric F64+ for standard tables. F2+ for small tables with row limits.

Frequently asked questions

What is the medallion architecture in Microsoft Fabric?

Medallion architecture organizes OneLake into three quality layers: Bronze (raw source data), Silver (cleansed and conformed data), and Gold (business-ready aggregations and dimensional models). All three layers are Delta tables in OneLake. Power BI Direct Lake semantic models are built on Gold layer tables.

What is Direct Lake mode?

Direct Lake mode is a Power BI connection method exclusive to Microsoft Fabric. It reads Delta Parquet files from OneLake directly — no data import, no scheduled refresh. It delivers Import-mode query performance without storing a copy of data in the semantic model. Available from Fabric F64+.

How is Fabric Pipelines different from Azure Data Factory?

Same canvas, different context. Fabric Pipelines have native Fabric activity types (Spark notebooks, Dataflows Gen2, SQL scripts, semantic model refreshes), use OneLake as default storage, appear in the Fabric Monitoring Hub, and consume Fabric Capacity Units instead of separate ADF billing meters. No self-hosted IR required for cloud sources.

Should we migrate from Azure Synapse to Microsoft Fabric?

For Microsoft 365-anchored enterprises using Power BI as the analytics consumption layer — yes. Fabric eliminates data synchronization overhead, storage duplication, and security model inconsistencies that Synapse's architecture requires. The TCO advantage is clearest at F64+ where Power BI Copilot is included without additional Azure OpenAI cost.

What languages do Fabric notebooks support?

Python, Scala, R, and SQL. Python is the most common for data engineering and ML workloads. All notebooks have native Delta Lake and OneLake access — no external connection configuration required. MLflow experiment tracking is built into the Fabric Data Science experience for all notebook types.

Start a Fabric data engineering engagement

Talk to an EPC Group Microsoft Fabric architect about your data engineering platform. Call (888) 381-9725 or request a discovery call.

Frequently Asked Questions

What is Microsoft Fabric and how does it differ from Azure Synapse Analytics?

Microsoft Fabric is a unified analytics platform that brings together data engineering, data science, data warehousing, real-time analytics, and business intelligence into a single SaaS product built on a shared data foundation called OneLake. While Azure Synapse Analytics was an integration of separate Azure services (Synapse SQL Pools, Spark Pools, Pipelines) each with independent storage and billing, Fabric provides a truly unified experience where all workloads share the same data lake (OneLake), the same security model, the same governance layer, and the same billing meter (Fabric Capacity Units). This means a table created by a data engineer in a lakehouse is immediately queryable by a Power BI analyst through Direct Lake mode without copying or moving data. Fabric also eliminates the infrastructure management complexity of Synapse: no provisioning SQL pools, no managing Spark cluster sizes, no configuring storage accounts. Everything is managed by the Fabric platform and scales automatically within your capacity. EPC Group has migrated 50+ organizations from Azure Synapse to Microsoft Fabric, typically reducing total cost of ownership by 30-40% while improving data platform team productivity by 50%.

What is the Fabric lakehouse and how does it compare to a traditional data warehouse?

The Fabric lakehouse combines the flexibility of a data lake (storing raw files in any format) with the structure and query performance of a data warehouse (SQL-based analysis with ACID transactions). Data is stored in OneLake in open Delta Lake format (Parquet files with a transaction log), which supports both Spark-based processing (Python, Scala, SQL) and T-SQL querying through the SQL analytics endpoint. Compared to a traditional data warehouse, the lakehouse offers: schema-on-read flexibility (store data first, define schema later), support for unstructured data (images, PDFs, logs) alongside structured tables, lower storage costs (OneLake uses Azure Data Lake pricing at ~$0.023/GB/month vs. $5-23/TB/month for dedicated SQL pools), and open format portability (Delta Lake is open-source, not proprietary). The tradeoff is that the lakehouse SQL analytics endpoint is optimized for analytical queries but does not support the full T-SQL surface area (no stored procedures, no triggers). For organizations needing full T-SQL compatibility, Fabric also provides a separate data warehouse engine. EPC Group recommends a hybrid approach for most enterprises: lakehouse for data engineering and staging, warehouse for governed semantic layers and complex T-SQL workloads.

How does OneLake work and what are shortcuts?

OneLake is the unified data lake that underpins all Microsoft Fabric workloads. Think of it as the "OneDrive for data": every Fabric tenant has a single OneLake, and every workspace, lakehouse, and warehouse stores data within it. OneLake is built on Azure Data Lake Storage Gen2 and uses the Delta Lake format for structured tables. All Fabric workloads (Spark, SQL, Power BI, Real-Time Analytics) read from and write to OneLake, eliminating data silos and copy operations. Shortcuts are OneLake virtual pointers to data stored elsewhere, either in another OneLake location, in external Azure Data Lake Storage Gen2, in Amazon S3, or in Google Cloud Storage. Shortcuts appear as tables or folders within a lakehouse but do not copy or move the data. When a Spark notebook or SQL query accesses a shortcut, it reads from the original location in real time. This enables organizations to: (1) Federate data across organizational boundaries without data movement, (2) Access existing data lakes (Azure, AWS, GCP) from Fabric without migration, (3) Share data between Fabric workspaces without duplication. EPC Group uses shortcuts extensively for hybrid architectures where organizations are migrating to Fabric incrementally, enabling Fabric analytics on existing data lake investments without upfront migration.

What is the medallion architecture and how do you implement it in Fabric?

The medallion architecture (Bronze, Silver, Gold) is a data engineering pattern that organizes data by quality and readiness for consumption. Bronze layer stores raw data as ingested from source systems with minimal transformation (preserving the source of truth). Silver layer applies business rules, data quality checks, deduplication, and standardized schemas. Gold layer provides business-ready aggregations, metrics, and dimensional models optimized for reporting and analytics. In Fabric, the medallion architecture is implemented using lakehouses: create separate Bronze, Silver, and Gold lakehouses (or layers within a single lakehouse using Delta table naming conventions). Bronze ingestion uses Fabric data pipelines or Dataflows Gen2 to land raw data as Delta tables. Silver transformation uses Spark notebooks to apply cleaning, deduplication, joins, and business rules, writing results to Silver Delta tables. Gold aggregation uses Spark notebooks or SQL views to create star schemas, pre-computed metrics, and business-ready tables consumed by Power BI Direct Lake datasets. EPC Group implements the medallion architecture for every enterprise Fabric deployment, with standardized notebook templates, automated quality checks between layers, and lineage tracking through Unity Catalog or Purview.

How do Fabric data pipelines compare to Azure Data Factory?

Fabric data pipelines are the evolution of Azure Data Factory (ADF) within the Fabric platform. They share the same visual pipeline designer, the same activity types (Copy, Dataflow, ForEach, If Condition, Web, etc.), and the same integration runtime architecture. If you know ADF, you already know Fabric pipelines. The key differences are: (1) Native Fabric integration: Fabric pipelines natively orchestrate Fabric activities including Spark notebooks, Dataflows Gen2, SQL scripts, and semantic model refreshes without requiring linked services or external connections. (2) OneLake as default storage: Copy activities write directly to OneLake lakehouse tables without configuring storage linked services. (3) Simplified monitoring: Pipeline runs are visible in the Fabric monitoring hub alongside all other workload executions. (4) Capacity-based billing: Pipeline execution consumes Fabric Capacity Units rather than separate ADF billing meters, simplifying cost management. (5) No self-hosted integration runtime management for cloud sources: Fabric handles connectivity to Azure, Microsoft 365, and 150+ cloud data sources natively. For existing ADF users, migration to Fabric pipelines is straightforward: EPC Group has migrated 100+ ADF pipelines to Fabric for enterprise clients, typically completing migration in 2-4 weeks with zero downtime.

What Fabric capacity size does an enterprise need?

Fabric capacity is measured in Capacity Units (CUs) which provide compute power for all Fabric workloads. The minimum capacity is F2 (2 CUs, approximately $260/month) suitable for proof-of-concept and small team development. For enterprise production workloads, EPC Group recommends starting with F64 (64 CUs, approximately $8,300/month) which supports: 20-30 concurrent data pipeline executions, 10-15 concurrent Spark notebook sessions, 50-100 concurrent Power BI Direct Lake queries, and daily processing of 500GB-1TB of incremental data. For large enterprises with heavy Spark workloads, F128 (approximately $16,600/month) or F256 (approximately $33,200/month) provides additional concurrency and faster processing. The key capacity sizing factors are: concurrent workloads (how many pipelines, notebooks, and queries run simultaneously), data volume (GB/TB processed daily), query complexity (simple aggregations vs. complex joins and ML workloads), and user count (number of concurrent Power BI viewers hitting Direct Lake datasets). Fabric supports capacity auto-scaling and bursting, allowing temporary capacity increases for peak processing windows. EPC Group conducts capacity sizing assessments for enterprise clients, right-sizing capacity to workload patterns and configuring auto-scale policies to minimize cost while ensuring performance SLAs.

How does real-time analytics work in Microsoft Fabric?

Fabric Real-Time Analytics (formerly Azure Data Explorer/Kusto in Fabric) provides sub-second query performance on streaming and time-series data. It uses KQL (Kusto Query Language) databases optimized for high-velocity data ingestion (millions of events per second) and real-time querying. The architecture includes: Eventstreams for ingesting real-time data from Azure Event Hubs, Azure IoT Hub, Kafka, custom applications, and Change Data Capture from databases. KQL databases for storing and querying time-series data with sub-second latency. Real-Time dashboards for visualizing streaming data with automatic refresh. Integration with lakehouses through OneLake shortcuts, enabling historical analysis of real-time data alongside batch-processed data. Enterprise use cases include: IoT telemetry monitoring (manufacturing, energy, healthcare devices), application performance monitoring (web app logs, API latency, error rates), security event analysis (SIEM data, network logs, authentication events), and financial market data analysis (trade execution, price feeds, risk metrics). EPC Group implements real-time analytics for enterprises needing sub-second insights from high-velocity data, with event processing pipelines handling 1M+ events per second and query response times under 500 milliseconds.

EO

About Errin O'Connor

CEO & Chief AI Architect, EPC Group

Errin O'Connor is the founder and Chief AI Architect of EPC Group, bringing 29 years of Microsoft ecosystem expertise. As a 4x Microsoft Press bestselling author and former NASA Lead Architect, Errin has led 50+ enterprise Microsoft Fabric implementations with lakehouse architectures processing petabyte-scale data across healthcare, finance, and government sectors.

Learn more about Errin
Share this article:

Related Articles

Microsoft Fabric Consulting Services

Read more

Power BI Consulting Services

Read more

Microsoft Fabric vs Databricks: Enterprise Comparison

Read more

Ready to Build Your Enterprise Data Lakehouse with Microsoft Fabric?

Our team has implemented Fabric lakehouse architectures for 50+ enterprises with 30-40% TCO reduction and 50% productivity improvement. Schedule a free Fabric Assessment today.

Schedule Free Fabric AssessmentCall 1-888-381-9725