Fabric Data Engineering: Lakehouse Guide 2026
Expert Insight from Errin O'Connor
29 years Microsoft consulting | 4x Microsoft Press bestselling author | Former NASA Lead Architect | 50+ enterprise Microsoft Fabric implementations with lakehouse architectures processing PB-scale data
Quick Answer
Microsoft Fabric brings together data engineering, data science, data warehousing, and business intelligence. It operates on a single platform that uses OneLake, a shared data lake.
OneLake stores all data in the open Delta Lake format.
The enterprise data engineering architecture follows the medallion pattern. It consists of three layers:
- Bronze lakehouses: These ingest raw data using data pipelines and Dataflows Gen2.
- Silver lakehouses: These apply business rules through Spark notebooks.
- Gold lakehouses: These offer business-ready dimensional models for Power BI Direct Lake datasets, enabling zero-copy analytics.
Organizations migrating from Azure Synapse to Fabric can achieve a 30-40% reduction in total cost of ownership (TCO). This transition helps eliminate data silos.
Additionally, OneLake shortcuts allow for federation with existing data lakes in Azure, AWS, and GCP without the need for data movement.
Table of Contents
Microsoft Fabric Data Engineering Guide 2026
Last updated: 2026 · Read time: ~10 min
Microsoft Fabric data engineering includes lakehouses, data pipelines, Spark notebooks, OneLake, and the medallion architecture. This guide explains how Fabric merges six Azure services into a single SaaS experience.
- Synapse
- Data Factory
- Power BI
- and more
- Fabric Pipelines vs ADF comparison
- Medallion architecture patterns
- Direct Lake mode mechanics
Key facts
- Microsoft Fabric combines six products: Data Engineering (Synapse Spark), Data Warehouse (Synapse SQL), Data Science (notebooks + MLflow), Real-Time Intelligence, Data Factory (orchestration), and Power BI.
- OneLake is the storage foundation — one copy of data serves all Fabric workloads. No ETL between components.
- Direct Lake mode: Power BI reads directly from OneLake Delta Parquet files. Import-mode performance without imported data storage cost.
- Fabric Pipelines vs Azure Data Factory: Fabric Pipelines have native Fabric integration, OneLake as default storage, capacity-based billing (Fabric CUs), and no self-hosted integration runtime management for cloud sources.
- Medallion architecture: Bronze (raw) → Silver (cleansed) → Gold (business-ready). All three layers stored in OneLake as Delta tables.
What Microsoft Fabric is
Microsoft Fabric is a SaaS analytics platform that combines six Azure services into one seamless experience. All workloads utilize the same data lake, OneLake, and the same governance layer, Microsoft Purview. They also share the same billing meter, Fabric Capacity Units.
The six products in one platform:
- Data Engineering — Synapse Spark: notebooks, pipelines, Delta Lake.
- Data Warehouse — Synapse SQL: T-SQL warehouse with auto-scaling.
- Data Science — notebooks, MLflow experiment tracking, model registry.
- Real-Time Intelligence — formerly Synapse Real-Time Analytics + Data Activator. Streaming analytics and event-driven actions.
- Data Factory — pipeline orchestration. Copy activities, Dataflows Gen2, scheduled and triggered pipelines.
- Power BI — semantic models, reports, dashboards, Direct Lake mode.
A data engineer writes to a lakehouse table. A Power BI analyst queries it immediately via Direct Lake — no ETL, no data copy, no permission reconfiguration.
OneLake architecture
OneLake is the single storage foundation. Every Fabric tenant gets one OneLake — automatically provisioned, no separate storage account setup required.
Key architectural characteristics:
- Hierarchical organization: tenants → workspaces → lakehouses → folders.
- Native Delta Lake (Parquet) format for all structured and semi-structured data.
- One copy of data serves all workloads — no ETL between Data Engineering, Data Warehouse, Data Science, and Power BI.
- ADLS Gen2 API compatibility — existing tools and scripts work without modification.
- Built-in governance via Microsoft Purview integration — sensitivity labels, lineage, data catalog.
Medallion architecture in Fabric
The medallion architecture organizes OneLake data into three quality layers:
- Bronze layer — raw ingested data. No transformation. Source-faithful. Data lands here from pipelines, streaming, or file upload.
- Silver layer — cleansed and conformed data. Null handling, deduplication, type casting, schema standardization applied. This is the analytical baseline.
- Gold layer — business-ready data. Aggregations, business logic, and dimensional models applied. Direct Lake semantic models are built on Gold layer tables.
All three layers live in OneLake as Delta tables. Power BI Direct Lake connects to Gold layer tables without importing data.
Fabric Pipelines vs Azure Data Factory
Fabric Pipelines and Azure Data Factory share the same pipeline canvas and activity types. They are different in five key ways:
- Native Fabric integration — Fabric Pipelines natively orchestrate Fabric activities: Spark notebooks, Dataflows Gen2, SQL scripts, and semantic model refreshes. No linked services or external connections required.
- OneLake as default storage — Copy activities write directly to OneLake lakehouse tables. No storage linked service configuration required.
- Simplified monitoring — Pipeline runs are visible in the Fabric Monitoring Hub alongside all other Fabric workload executions in a single pane.
- Capacity-based billing — Pipeline execution consumes Fabric Capacity Units (CUs) rather than separate ADF billing meters. Simpler cost management.
- No self-hosted integration runtime management for cloud sources — Fabric handles connectivity to Azure, Microsoft 365, and 150+ cloud data sources natively. Self-hosted IR is only needed for on-premises sources.
Spark notebooks in Fabric
Fabric notebooks run on Spark. They are the primary data engineering tool for complex transformations, ML feature engineering, and large-scale data processing.
- Native Python, Scala, and R kernel support.
- Native Delta Lake read/write without Spark configuration — Delta is the default table format in Fabric.
- Native OneLake access — no ADLS connection string or storage account key needed.
- MLflow integration — experiment tracking, model logging, and model registry built into the Fabric Data Science experience.
- Notebook scheduling via Fabric Pipelines — notebooks run as pipeline activities with input/output parameter passing.
Fabric vs Azure Synapse Analytics
Azure Synapse Analytics combined various Azure services, each with its own storage and billing. In contrast, Fabric provides a unified experience.
Now, all workloads share:
- the same data lake
- a common security model
- a governance layer
- one billing meter
The four "no more" differences for Synapse-to-Fabric migrations:
- No more ETL pipelines between the data lake and data warehouse — all workloads read from the same OneLake.
- No more data synchronization issues — data written by Spark is immediately queryable by SQL, Power BI, and Data Science workloads.
- No more security model inconsistencies between different storage systems.
- No more duplicate storage costs for the same data in multiple formats.
Direct Lake mode
Direct Lake is the Power BI connection mode exclusive to Microsoft Fabric. It reads Delta Parquet files directly from OneLake — without importing data into the semantic model.
- Import-mode query performance — column-store indexing provides sub-second response on large tables.
- No scheduled refresh — data in OneLake is always current. Power BI reads the latest snapshot automatically.
- No storage overhead — semantic model does not store a copy of the data. OneLake is the single source of truth.
- Available from Fabric F64+ for standard tables. F2+ for small tables with row limits.
Frequently asked questions
What is the medallion architecture in Microsoft Fabric?
Medallion architecture structures OneLake into three quality layers:
- Bronze: raw source data
- Silver: cleansed and conformed data
- Gold: business-ready aggregations and dimensional models
All three layers are Delta tables in OneLake. Power BI Direct Lake semantic models are built on Gold layer tables.
What is Direct Lake mode?
Direct Lake mode is a special connection method for Power BI in Microsoft Fabric. It enables users to read Delta Parquet files directly from OneLake. This method eliminates the need for data import or scheduled refresh.
Direct Lake mode offers Import-mode query performance without keeping a copy of the data in the semantic model. It is available starting from Fabric F64+.
How is Fabric Pipelines different from Azure Data Factory?
Same canvas, different context. Fabric Pipelines include several native Fabric activity types:
- Spark notebooks
- Dataflows Gen2
- SQL scripts
- Semantic model refreshes
OneLake is the default storage option in the Fabric Monitoring Hub. Users utilize Fabric Capacity Units instead of separate ADF billing meters.
Furthermore, there is no need for a self-hosted Integration Runtime (IR) for cloud sources.
Should we migrate from Azure Synapse to Microsoft Fabric?
Enterprises using Microsoft 365 with Power BI benefit from Fabric in several important ways. It removes the need for data synchronization and cuts down on storage duplication. Furthermore, it addresses security model inconsistencies found in Synapse's architecture.
The total cost of ownership (TCO) advantage is most evident at F64+. Here, Power BI Copilot is included at no extra cost for Azure OpenAI.
What languages do Fabric notebooks support?
Python, Scala, R, and SQL are essential programming languages for data engineering and machine learning. Among these, Python is the most popular choice.
All notebooks offer native access to Delta Lake and OneLake. This feature removes the need for setting up external connections.
Additionally, MLflow experiment tracking is integrated into the Fabric Data Science experience for all notebook types.
Start a Fabric data engineering engagement
Talk to an EPC Group Microsoft Fabric architect about your data engineering platform. Call (888) 381-9725 or request a discovery call.
Frequently Asked Questions
What is Microsoft Fabric and how does it differ from Azure Synapse Analytics?
Microsoft Fabric is a unified analytics platform that brings together data engineering, data science, data warehousing, real-time analytics, and business intelligence into a single SaaS product built on a shared data foundation called OneLake. While Azure Synapse Analytics was an integration of separate Azure services (Synapse SQL Pools, Spark Pools, Pipelines) each with independent storage and billing, Fabric provides a truly unified experience where all workloads share the same data lake (OneLake), the same security model, the same governance layer, and the same billing meter (Fabric Capacity Units). This means a table created by a data engineer in a lakehouse is immediately queryable by a Power BI analyst through Direct Lake mode without copying or moving data. Fabric also eliminates the infrastructure management complexity of Synapse: no provisioning SQL pools, no managing Spark cluster sizes, no configuring storage accounts. Everything is managed by the Fabric platform and scales automatically within your capacity. EPC Group has migrated 50+ organizations from Azure Synapse to Microsoft Fabric, typically reducing total cost of ownership by 30-40% while improving data platform team productivity by 50%.
What is the Fabric lakehouse and how does it compare to a traditional data warehouse?
The Fabric lakehouse combines the flexibility of a data lake (storing raw files in any format) with the structure and query performance of a data warehouse (SQL-based analysis with ACID transactions). Data is stored in OneLake in open Delta Lake format (Parquet files with a transaction log), which supports both Spark-based processing (Python, Scala, SQL) and T-SQL querying through the SQL analytics endpoint. Compared to a traditional data warehouse, the lakehouse offers: schema-on-read flexibility (store data first, define schema later), support for unstructured data (images, PDFs, logs) alongside structured tables, lower storage costs (OneLake uses Azure Data Lake pricing at ~$0.023/GB/month vs. $5-23/TB/month for dedicated SQL pools), and open format portability (Delta Lake is open-source, not proprietary). The tradeoff is that the lakehouse SQL analytics endpoint is optimized for analytical queries but does not support the full T-SQL surface area (no stored procedures, no triggers). For organizations needing full T-SQL compatibility, Fabric also provides a separate data warehouse engine. EPC Group recommends a hybrid approach for most enterprises: lakehouse for data engineering and staging, warehouse for governed semantic layers and complex T-SQL workloads.
How does OneLake work and what are shortcuts?
OneLake is the unified data lake that underpins all Microsoft Fabric workloads. Think of it as the "OneDrive for data": every Fabric tenant has a single OneLake, and every workspace, lakehouse, and warehouse stores data within it. OneLake is built on Azure Data Lake Storage Gen2 and uses the Delta Lake format for structured tables. All Fabric workloads (Spark, SQL, Power BI, Real-Time Analytics) read from and write to OneLake, eliminating data silos and copy operations. Shortcuts are OneLake virtual pointers to data stored elsewhere, either in another OneLake location, in external Azure Data Lake Storage Gen2, in Amazon S3, or in Google Cloud Storage. Shortcuts appear as tables or folders within a lakehouse but do not copy or move the data. When a Spark notebook or SQL query accesses a shortcut, it reads from the original location in real time. This enables organizations to: (1) Federate data across organizational boundaries without data movement, (2) Access existing data lakes (Azure, AWS, GCP) from Fabric without migration, (3) Share data between Fabric workspaces without duplication. EPC Group uses shortcuts extensively for hybrid architectures where organizations are migrating to Fabric incrementally, enabling Fabric analytics on existing data lake investments without upfront migration.
What is the medallion architecture and how do you implement it in Fabric?
The medallion architecture (Bronze, Silver, Gold) is a data engineering pattern that organizes data by quality and readiness for consumption. Bronze layer stores raw data as ingested from source systems with minimal transformation (preserving the source of truth). Silver layer applies business rules, data quality checks, deduplication, and standardized schemas. Gold layer provides business-ready aggregations, metrics, and dimensional models optimized for reporting and analytics. In Fabric, the medallion architecture is implemented using lakehouses: create separate Bronze, Silver, and Gold lakehouses (or layers within a single lakehouse using Delta table naming conventions). Bronze ingestion uses Fabric data pipelines or Dataflows Gen2 to land raw data as Delta tables. Silver transformation uses Spark notebooks to apply cleaning, deduplication, joins, and business rules, writing results to Silver Delta tables. Gold aggregation uses Spark notebooks or SQL views to create star schemas, pre-computed metrics, and business-ready tables consumed by Power BI Direct Lake datasets. EPC Group implements the medallion architecture for every enterprise Fabric deployment, with standardized notebook templates, automated quality checks between layers, and lineage tracking through Unity Catalog or Purview.
How do Fabric data pipelines compare to Azure Data Factory?
Fabric data pipelines are the evolution of Azure Data Factory (ADF) within the Fabric platform. They share the same visual pipeline designer, the same activity types (Copy, Dataflow, ForEach, If Condition, Web, etc.), and the same integration runtime architecture. If you know ADF, you already know Fabric pipelines. The key differences are: (1) Native Fabric integration: Fabric pipelines natively orchestrate Fabric activities including Spark notebooks, Dataflows Gen2, SQL scripts, and semantic model refreshes without requiring linked services or external connections. (2) OneLake as default storage: Copy activities write directly to OneLake lakehouse tables without configuring storage linked services. (3) Simplified monitoring: Pipeline runs are visible in the Fabric monitoring hub alongside all other workload executions. (4) Capacity-based billing: Pipeline execution consumes Fabric Capacity Units rather than separate ADF billing meters, simplifying cost management. (5) No self-hosted integration runtime management for cloud sources: Fabric handles connectivity to Azure, Microsoft 365, and 150+ cloud data sources natively. For existing ADF users, migration to Fabric pipelines is straightforward: EPC Group has migrated 100+ ADF pipelines to Fabric for enterprise clients, typically completing migration in 2-4 weeks with zero downtime.
What Fabric capacity size does an enterprise need?
Fabric capacity is measured in Capacity Units (CUs) which provide compute power for all Fabric workloads. The minimum capacity is F2 (2 CUs, approximately $260/month) suitable for proof-of-concept and small team development. For enterprise production workloads, EPC Group recommends starting with F64 (64 CUs, approximately $8,300/month) which supports: 20-30 concurrent data pipeline executions, 10-15 concurrent Spark notebook sessions, 50-100 concurrent Power BI Direct Lake queries, and daily processing of 500GB-1TB of incremental data. For large enterprises with heavy Spark workloads, F128 (approximately $16,600/month) or F256 (approximately $33,200/month) provides additional concurrency and faster processing. The key capacity sizing factors are: concurrent workloads (how many pipelines, notebooks, and queries run simultaneously), data volume (GB/TB processed daily), query complexity (simple aggregations vs. complex joins and ML workloads), and user count (number of concurrent Power BI viewers hitting Direct Lake datasets). Fabric supports capacity auto-scaling and bursting, allowing temporary capacity increases for peak processing windows. EPC Group conducts capacity sizing assessments for enterprise clients, right-sizing capacity to workload patterns and configuring auto-scale policies to minimize cost while ensuring performance SLAs.
How does real-time analytics work in Microsoft Fabric?
Fabric Real-Time Analytics (formerly Azure Data Explorer/Kusto in Fabric) provides sub-second query performance on streaming and time-series data. It uses KQL (Kusto Query Language) databases optimized for high-velocity data ingestion (millions of events per second) and real-time querying. The architecture includes: Eventstreams for ingesting real-time data from Azure Event Hubs, Azure IoT Hub, Kafka, custom applications, and Change Data Capture from databases. KQL databases for storing and querying time-series data with sub-second latency. Real-Time dashboards for visualizing streaming data with automatic refresh. Integration with lakehouses through OneLake shortcuts, enabling historical analysis of real-time data alongside batch-processed data. Enterprise use cases include: IoT telemetry monitoring (manufacturing, energy, healthcare devices), application performance monitoring (web app logs, API latency, error rates), security event analysis (SIEM data, network logs, authentication events), and financial market data analysis (trade execution, price feeds, risk metrics). EPC Group implements real-time analytics for enterprises needing sub-second insights from high-velocity data, with event processing pipelines handling 1M+ events per second and query response times under 500 milliseconds.
About Errin O'Connor
CEO & Chief AI Architect, EPC Group
Errin O'Connor is the founder and Chief AI Architect of EPC Group. He has 29 years of experience in the Microsoft ecosystem. Errin is also a bestselling author with four Microsoft Press titles.
Additionally, he has served as a Lead Architect at NASA.
He has successfully led over 50 enterprise Microsoft Fabric implementations. These projects feature lakehouse architectures that handle petabyte-scale data in various sectors, including:
- Healthcare
- Finance
- Government
