AI assistant — not human

Enterprise Guide 2026: Lakehouse architecture, Spark notebooks, data pipelines, Delta Lake, and governance best practices for production-grade data engineering.
Microsoft Fabric Data Engineering is the Spark-based workload inside Microsoft Fabric for building enterprise data pipelines, lakehouses, and Delta Lake tables. This guide covers Lakehouse architecture, Notebooks, medallion architecture, Delta Lake, data governance best practices, and cost optimization strategies for enterprise teams in 2026.
Featured Snippet: Data engineering in Microsoft Fabric is a unified field. It combines lakehouse architecture, Apache Spark notebooks, Data Factory pipelines, and Delta Lake storage into one SaaS platform.
Fabric removes the need to connect different Azure services. Data engineers benefit from:
All these features are available in a single experience with zero infrastructure management.
Microsoft Fabric has changed how enterprise data engineering teams build, manage, and oversee data platforms. In the past, creating an enterprise data pipeline on Azure required setting up and integrating at least four different services:
Each service had its own security model, billing structure, and operational overhead.
Fabric simplifies the entire data stack into a single, capacity-based SaaS experience. It features a Lakehouse that merges the flexibility of a data lake with the organization of a data warehouse.
This setup eliminates data movement between services, separate access control configurations, and cluster management.
This guide addresses key areas of Fabric data engineering that enterprise teams must understand by 2026. These areas include:
Whether you are migrating from Azure Synapse, Databricks, or starting from scratch, this is your complete reference.
The Fabric Lakehouse is essential for data engineering. It combines the benefits of both a traditional data warehouse and a data lake.
Key features include:
Every Lakehouse in Fabric automatically creates two endpoints. One is a Spark endpoint for notebook-based processing. The other is a SQL analytics endpoint for T-SQL queries. Both endpoints access the same Delta tables in OneLake. This design reduces data duplication and removes ETL requirements.
This dual-endpoint architecture allows:
Single data lake for the entire organization. All Fabric workspaces share OneLake, eliminating data silos and redundant copies across teams.
All Lakehouse tables use Delta format — ACID transactions, time travel, schema evolution, and Z-order optimization built into every table.
Every Lakehouse exposes Spark and SQL analytics endpoints simultaneously. No data movement — one table, two access patterns.
Store unstructured files (CSV, JSON, images) in the Files section and structured Delta tables in the Tables section — same Lakehouse.
EPC Group suggests using a multi-Lakehouse pattern for enterprise deployments. This approach includes creating one Lakehouse for each domain. Key domains include:
All of these will be organized within a shared workspace.
Key benefits of this approach include:
Cross-Lakehouse references can be created using shortcuts.
Each domain team is responsible for managing their own Lakehouse lifecycle. This includes:
These responsibilities do not affect other teams.
Fabric Notebooks are crucial for data transformation. They run Apache Spark on Microsoft-managed compute pools. This eliminates the need for:
When you open a notebook and run a cell, Fabric automatically allocates Spark resources from your capacity. These resources are released when the session ends. This serverless model helps reduce the operational overhead that often makes Spark difficult to manage in traditional environments.
Notebooks support PySpark, Spark SQL, Scala, and R. For most enterprise data engineering tasks, PySpark and Spark SQL handle 95% of use cases.
Fabric also allows for notebook parameterization. This feature lets you call notebooks from pipelines with runtime parameters. As a result, you can use the same transformation logic to process:
This approach prevents code duplication.
Fabric Data Factory pipelines are essential for enterprise data engineering. They offer a visual, drag-and-drop interface for creating multi-step data workflows. These workflows can:
For teams already using Azure Data Factory, Fabric pipelines are architecturally the same. They share the same activity types, expression language, and linked service model.
The main difference between standalone ADF and Fabric is the level of integration. Fabric pipelines can directly reference:
All of these are within the same workspace. This means there are no connection strings or linked services needed.
A pipeline can:
All of this happens in a single orchestrated workflow with complete lineage tracking.
Ingest from SQL Server, Oracle, Salesforce, SAP, REST APIs, file systems, and cloud storage — same connector library as Azure Data Factory.
Chain activities with success/failure/completion dependencies. Build complex DAGs with parallel branches and conditional logic.
Schedule pipelines on cron expressions, tumbling windows, or event triggers. Support for parameterized schedules across environments.
For enterprise-scale orchestration, EPC Group recommends using a hub-and-spoke pipeline pattern. This method includes a master pipeline that oversees child pipelines for each data domain. The data domains are:
Each child pipeline manages the full bronze-silver-gold transformation for its domain. The master pipeline oversees cross-domain dependencies and gives combined alerts on success or failure.
This setup scales easily as new data domains are added. It also simplifies debugging by isolating failures to specific domains.
Shortcuts are among Fabric's most powerful and often underused features for data engineering. A shortcut acts as a reference pointer to data stored outside your Lakehouse. This data can be found in:
The data shows up as a native table or folder in your Lakehouse. You can query it using Spark and SQL.
Importantly, the data is never physically copied to OneLake. This means you incur no OneLake storage fees for shortcut data.
Shortcuts address three key challenges in enterprise data engineering:
Shortcuts make it easier to share data across workspaces and tenants. For instance, a finance team can:
Every table in a Fabric Lakehouse is a Delta Lake table. Delta Lake provides important features such as:
For data engineers, this means you will achieve warehouse-grade reliability on lake-scale storage. You will avoid issues such as:
The medallion architecture is the best way to organize Delta tables in a Fabric Lakehouse. It offers a clear and traceable data lineage.
This lineage tracks data from raw ingestion to business-ready analytics.
Ingest raw data exactly as received from source systems. Append-only, no transformations. Add ingestion metadata columns (source_system, ingestion_timestamp, batch_id). This layer serves as the immutable audit trail — you can always reprocess from bronze if silver/gold logic changes.
Apply cleansing, deduplication, type casting, null handling, and business key resolution. Enforce schema with Delta schema enforcement. Merge (upsert) patterns for slowly changing dimensions. Silver tables are the single source of truth for conformed enterprise data.
Produce aggregated, denormalized, business-ready tables optimized for Power BI DirectLake mode. Star schema design with fact and dimension tables. V-Order optimized for maximum query performance. Gold tables serve analysts, reports, and AI/ML feature engineering.
Fabric data engineering performance relies on three key factors:
Getting all three right can mean the difference between a platform that provides sub-second dashboards and one that frustrates analysts with delays.
Governance in Fabric data engineering is crucial and integrated at all levels of the platform. OneLake provides automatic lineage tracking, which includes:
Additionally, Microsoft Purview works seamlessly with Fabric to:
For enterprise data engineering teams, governance is important in four main areas:
Fabric addresses the first three areas natively. However, data quality requires engineering discipline through validation notebooks and monitoring.
Workspace roles (Admin, Member, Contributor, Viewer) control Lakehouse access. Row-level security and object-level security for fine-grained table protection.
Purview automatically scans Lakehouse tables for PII, financial data, and health information. Sensitivity labels propagate from source to downstream artifacts.
Automatic lineage from pipeline ingestion through notebook transformations to Power BI reports. No manual documentation required — Fabric tracks every dependency.
Build validation notebooks that check row counts, null rates, schema conformance, and business rule compliance after each pipeline run.
Fabric has a pricing model based on capacity. You buy Capacity Units (CUs) that are shared across all Fabric workloads within a capacity. Data engineering uses CUs in the following ways:
OneLake storage is billed separately at about $0.023 per GB per month, which is the same as the ADLS Gen2 hot tier.
One common cost mistake in Fabric data engineering is over-provisioning capacity for development environments. A single F64 capacity costs $4,096 per month when reserved.
This capacity is enough for a 10-person data engineering team. It can effectively handle development and testing workloads.
For production workloads that require heavy Spark processing, consider using F128 or F256. However, EPC Group recommends starting with F64.
Scale your capacity based on actual usage metrics rather than estimated workload projections.
Data engineering in Microsoft Fabric is a unified discipline that combines lakehouse architecture, Apache Spark notebooks, data pipelines, and Delta Lake storage into a single SaaS platform. Fabric data engineers use OneLake as the centralized storage layer, write transformations in PySpark or Spark SQL notebooks, orchestrate workflows with Data Factory pipelines, and leverage shortcuts to connect external data sources — all without managing infrastructure. The result is a modern data engineering experience that eliminates the complexity of stitching together separate Azure services like Synapse, Data Factory, and ADLS Gen2.
The Fabric Lakehouse is a combined data lake and data warehouse that stores data in open Delta Lake (Parquet) format on OneLake. It supports both SQL analytics and Spark-based data engineering on the same data without duplication. You create tables that are automatically registered in the SQL analytics endpoint for T-SQL queries and simultaneously accessible via Spark notebooks for transformations. The Lakehouse eliminates the traditional choice between a data lake (flexible but ungoverned) and a data warehouse (structured but rigid) — delivering both capabilities on a single copy of data.
Fabric Notebooks run Apache Spark on Microsoft-managed compute — no cluster provisioning or configuration required. They support PySpark, Spark SQL, Scala, and R with built-in visualization and collaboration features. Databricks Notebooks offer more advanced features like MLflow integration, Databricks Connect for local IDE development, and more granular cluster control. For standard data engineering workloads (ETL, data cleansing, aggregation), Fabric Notebooks are equally capable with significantly lower operational overhead. For advanced ML engineering and custom Spark tuning, Databricks retains an edge.
Fabric Shortcuts are pointers to external data sources that make data appear as if it lives in your Lakehouse without physically copying it. Shortcuts support ADLS Gen2, Amazon S3, Google Cloud Storage, and Dataverse. Use shortcuts when: (1) data must remain in its source system for compliance, (2) you want to avoid storage duplication costs, (3) you need to federate data across organizational boundaries, or (4) you are migrating incrementally and want to reference legacy storage during transition. Shortcuts are read-only by default and respect the security policies of the source system.
The medallion architecture (bronze-silver-gold) is the recommended data organization pattern in Fabric Lakehouse. Bronze layer ingests raw data from source systems with minimal transformation — preserving the original format for auditability. Silver layer applies cleansing, deduplication, schema enforcement, and business logic to create conformed datasets. Gold layer produces aggregated, business-ready tables optimized for reporting and analytics. In Fabric, each layer is a set of Delta tables in the Lakehouse, with Spark notebooks or Data Factory pipelines orchestrating the transformations between layers.
Fabric uses Data Factory pipelines for orchestration — a visual, code-free interface for scheduling and sequencing data movement and transformation activities. Pipelines support 90+ connectors for ingesting data from cloud and on-premises sources. You can chain Spark notebook executions, stored procedures, Dataflow Gen2 transformations, and copy activities into multi-step workflows with dependency management, retry logic, and alerting. Pipelines also support parameterization, allowing the same pipeline to process different datasets or environments dynamically.
Key cost strategies include: (1) Use reserved capacity (F64+) for predictable workloads — saves 40% vs pay-as-you-go. (2) Implement workspace-level capacity assignment to isolate cost by team or project. (3) Use Spark session timeouts to prevent idle compute consumption. (4) Leverage V-Order optimization on Delta tables to reduce query compute. (5) Schedule heavy pipelines during off-peak hours to smooth capacity utilization. (6) Use shortcuts instead of data copies to eliminate redundant storage. (7) Monitor capacity metrics in the Fabric admin portal and set alerts for sustained high utilization.
Fabric governance is built on Microsoft Purview integration and OneLake security. Data engineers benefit from: automatic data lineage tracking across notebooks and pipelines, sensitivity labels that propagate from source to downstream tables, workspace-level access control with Entra ID, row-level and object-level security on Lakehouse tables, and endorsement workflows (certified/promoted) for dataset quality signaling. All data in OneLake is encrypted at rest and in transit. Purview Data Catalog automatically discovers and classifies Lakehouse tables, enabling data stewards to manage the entire engineering lifecycle from a single governance plane.
Yes. Microsoft provides migration paths from Azure Synapse Analytics and Azure Data Factory to Fabric. Synapse Spark pools map directly to Fabric Spark notebooks with minimal code changes. ADF pipelines can be migrated to Fabric Data Factory with the pipeline migration wizard — most activities transfer directly. Synapse SQL dedicated pools require more effort, as Fabric uses a different SQL engine. EPC Group recommends a phased migration: start with new workloads on Fabric, migrate existing ADF pipelines next, then gradually transition Synapse workloads as Fabric capabilities mature. We have completed 50+ Fabric migrations for enterprise clients.
Enterprise Fabric implementation, migration, and optimization services from EPC Group.
Read moreComprehensive overview of Microsoft Fabric capabilities, licensing, and adoption strategy.
Read moreHead-to-head comparison of Microsoft Fabric and Databricks for enterprise data platforms.
Read moreEPC Group has successfully completed over 50 Microsoft Fabric implementations for enterprise clients. Our certified Fabric engineers specialize in:
Schedule a free data engineering assessment today.
Microsoft Fabric Data Engineering is the Spark-based workload within Microsoft Fabric. It helps build enterprise data pipelines, lakehouses, and Delta Lake tables.
This guide includes:
The Fabric Lakehouse is the primary data storage and processing workspace for enterprise data engineering. It stores data in Delta Lake format on OneLake.
Medallion architecture organizes data into three layers. Each layer adds quality and semantic structure to the raw source data.
Shortcuts reference external data from inside a Lakehouse without physically moving it into OneLake. Use shortcuts when:
Governance for data engineering teams covers four areas. Build all four into the architecture — not as afterthoughts.
Seven strategies to reduce Fabric compute costs for enterprise data engineering workloads.
Fabric Data Engineering is the Spark-based workload in Microsoft Fabric. It helps users build data pipelines, lakehouses, and Delta Lake tables. Users can create and manage projects using Notebooks and Spark Job Definitions.
Data written to the Lakehouse is immediately available to all other Fabric workloads — Warehouse (T-SQL), Real-Time Intelligence (KQL), and Power BI (Direct Lake) — without copying.
Medallion architecture structures a Fabric Lakehouse into three Delta Lake table layers:
Power BI semantic models usually access Gold tables through Direct Lake mode for optimal performance.
Both options store data in OneLake and are accessible to Power BI. However, there is a key difference:
Warehouse is built on T-SQL. It is designed for SQL-first analytics teams and uses familiar database query patterns.
Data in Warehouse and Lakehouse can be accessed in two ways:
Here are seven strategies for optimizing costs:
EPC Group conducts Fabric cost optimization reviews as part of its managed services engagements.
Fabric Real-Time Intelligence is a unique yet integrated workload. It uses KQL databases and Fabric event streams. This combination enables fast ingestion and querying of streaming data.
Data in KQL databases is accessible from Fabric Notebooks via a connector, and can be materialized into Delta Lake tables for batch analytics use cases.
EPC Group designs and implements enterprise Fabric data engineering platforms — from OneLake architecture through Medallion build and governance. Call (888) 381-9725 or schedule a discovery call.