Dataflow Gen1 to Microsoft Fabric Migration: The Complete Enterprise Playbook
By Errin O'Connor | Published April 15, 2026 | 12 min read
Microsoft has announced the deprecation timeline for Dataflow Gen1. If your enterprise relies on dozens — or hundreds — of Gen1 dataflows for ETL, this playbook gives you the step-by-step framework EPC Group uses to migrate clients to Dataflow Gen2 in Microsoft Fabric without disrupting production pipelines.
Why Dataflow Gen1 Migration Cannot Wait
Dataflow Gen1 was introduced as Power BI's self-service ETL tool — a way for business analysts to prepare data without writing code. It worked well for departmental use cases, but enterprises quickly outgrew its limitations: no staging layer, limited orchestration, dataset-only output, and gateway-bound connectivity.
Microsoft Fabric changes the game. Dataflow Gen2 is more than an upgrade; it is a complete re-architecture. Gen2 dataflows offer several key features:
- Run on Fabric compute
- Output to any Fabric destination, including Lakehouse, Warehouse, KQL Database, and Azure SQL
- Support staging lakehouses for intermediate transformations
- Integrate natively with Fabric pipelines for enterprise orchestration
The business case is straightforward: Gen1 dataflows will eventually lose support, Gen2 offers 2–5x better performance through query folding and fast copy, and Fabric's unified capacity model eliminates the separate Power BI Premium Per User or Per Capacity licensing overhead. Organizations that migrate early gain access to Copilot in Data Factory, enhanced monitoring, and the full Fabric governance stack through Microsoft Purview integration.
Phase 1: Migration Assessment and Inventory
Every successful migration starts with a complete inventory. EPC Group's assessment framework catalogs every Gen1 dataflow across your tenant, classifies migration complexity, and builds the dependency map that drives sequencing.
Dataflow Inventory Checklist
- Total dataflow count — Use the Power BI REST API (Groups/Dataflows endpoint) to enumerate all dataflows across all workspaces. Do not rely on manual workspace audits.
- Data source inventory — Catalog every data source connection: SQL Server, Oracle, REST APIs, SharePoint lists, Excel files, OData feeds. Flag any sources using on-premises data gateways.
- M query complexity scoring — Parse each dataflow's M queries for custom functions, nested let expressions, error handling patterns, and Value.NativeQuery calls. Score each as Low (direct migration), Medium (minor refactoring), or High (rewrite required).
- Downstream dependency mapping — Identify every Power BI dataset, report, and dashboard that consumes each dataflow. This determines migration sequencing — you cannot migrate a dataflow until its consumers are ready.
- Refresh schedule and SLA documentation — Record current refresh frequencies, windows, and business SLAs. Gen2 refresh times may differ, so baseline metrics are essential for post-migration validation.
- Incremental refresh configurations — Document every dataflow using RangeStart/RangeEnd parameters. These require specific migration handling in Gen2.
EPC Group delivers this assessment as a 2-week engagement with a detailed migration plan, effort estimate, and risk register. For organizations with complex Power BI environments, the assessment alone prevents costly surprises during execution.
Phase 2: Dataflow Gen2 Architecture Design
Gen2 is not a 1:1 replacement for Gen1. The architecture is fundamentally different, and a successful migration requires deliberate design decisions before writing a single M query.
Key Architecture Decisions
Output Destination: Lakehouse vs. Warehouse vs. Semantic Model
Gen1 dataflows output only to Power BI datasets. Gen2 gives you choices. For raw/bronze data, output to a Fabric Lakehouse (Delta tables). For curated/gold data consumed by SQL analysts, output to a Fabric Warehouse. For direct BI consumption, output to a Power BI semantic model. Most enterprises use a tiered approach: dataflow outputs to Lakehouse, a downstream pipeline transforms into Warehouse, and Fabric's Direct Lake mode connects Power BI to the gold layer.
Staging Lakehouse Strategy
Gen2 introduces staging lakehouses for dataflow processing. These lakehouses act as temporary storage for high-volume dataflows, particularly those with over 1 million rows. This feature enhances fast copying and improves query folding.
However, it is advisable to disable staging for small-volume, low-latency dataflows. The overhead in these cases may not be justified.
We recommend designing a naming convention for your lakehouses:
- Staging lakehouses:
stg_df_[domain]_[source] - Output lakehouses:
lh_[domain]_[layer]
Workspace and Capacity Planning
Organize Gen2 dataflows into Fabric workspaces by domain. Examples include Sales, Finance, and Operations. This method is more effective than organizing by source system.
Each workspace links to a Fabric capacity. This setup aids in:
- Cost allocation
- Performance isolation
Begin with a shared development capacity (F64) and create isolated production capacities for each domain. During the pilot phase, monitor CU consumption using the Fabric Capacity Metrics app before moving to production SKUs.
Phase 3: M Query Conversion and Testing
The core migration work is converting M queries from Gen1 to Gen2. While most M code is compatible, several patterns require attention.
Common M Query Migration Patterns
- Linked entities — Gen1 supports linked entities (referencing another dataflow's output). Gen2 replaces this with staging lakehouses. Convert linked entity references to Lakehouse.Table() calls pointing to the staging lakehouse.
- Computed entities — Similar to linked entities, computed entities in Gen1 are replaced by reading from the staging lakehouse in Gen2. Refactor computed entity queries to read from Delta tables.
- Custom connectors — Gen2 supports a growing but not yet complete set of connectors. Audit your custom connectors against the Gen2 compatibility list. For unsupported connectors, use a Fabric pipeline with a Web activity or Azure Function as a bridge.
- Gateway-dependent sources — Gen2 supports on-premises data gateways, but the configuration differs. Re-bind gateway connections in the Gen2 dataflow settings. Test connectivity before migrating M queries.
- Error handling patterns — M queries using try/otherwise patterns and custom error tables migrate directly. However, Gen2's error handling in the output configuration (row-level error capture) provides a better alternative. Consider refactoring error handling to use native Gen2 capabilities.
Testing Methodology
EPC Group follows a three-stage testing process for every migrated dataflow:
- Row count validation: The Gen2 output row count must match Gen1 within 0.01%.
- Hash-based data comparison: Generate SHA-256 hashes of key columns in both Gen1 and Gen2 output and compare them.
- Refresh performance benchmarking: Gen2 refresh time must be within 120% of the Gen1 baseline. Most achieve 50–80% of the Gen1 time.
Document the results in a migration validation matrix and obtain business owner sign-off before cutover.
Phase 4: Incremental Refresh Migration
Incremental refresh is a complex topic in migration. In Gen1, you set the RangeStart and RangeEnd parameters. Power BI then manages partition creation automatically.
In Gen2, the method varies based on your output destination.
Key difference: If your Gen2 dataflow outputs to a Lakehouse, incremental refresh is managed through Fabric pipeline scheduling. This uses date-parameterized M queries. If you are outputting to a semantic model, you can use the built-in incremental refresh policy. This is similar to Gen1 but is configured at the semantic model level, not the dataflow level.
EPC Group uses a sliding-window pattern for Lakehouse destinations. The M query accepts StartDate and EndDate parameters. The Fabric pipeline passes dynamic date values during each scheduled run. The Lakehouse table is set up for merge (upsert) instead of overwrite. This approach provides true incremental loading and full control over the refresh window.
Phase 5: Capacity Planning and Performance Optimization
Fabric capacity planning for Dataflow Gen2 workloads requires understanding CU consumption patterns. Unlike Gen1, where refresh capacity was part of your Power BI Premium SKU, Gen2 dataflows compete for CUs with every other Fabric workload in the same capacity.
Capacity Sizing Guidelines
| Workload Size | Dataflow Count | Recommended SKU | Estimated Monthly Cost |
|---|---|---|---|
| Small | 10–30 | F32 | $4,200 |
| Medium | 30–100 | F64 | $8,400 |
| Large | 100–300 | F128 | $16,800 |
| Enterprise | 300+ | F256+ | $33,600+ |
These are initial estimates. Actual CU consumption varies based on several factors:
- Data volume
- Transformation complexity
- Connector type (fast copy vs. standard)
- Concurrency
EPC Group's capacity planning engagement includes a 2-week pilot with detailed CU consumption analysis and recommendations for right-sizing.
Performance Optimization Checklist
- Enable staging for dataflows processing over 1 million rows to leverage fast copy
- Maximize query folding — push filters, column selections, and joins to the source system
- Avoid Table.Buffer() and List.Buffer() in M queries — these prevent query folding and consume memory
- Use native query (Value.NativeQuery) for SQL sources to ensure server-side execution
- Schedule high-volume dataflows during off-peak hours to avoid CU contention
- Monitor using the Fabric Capacity Metrics app — set alerts for CU utilization above 80%
Cutover Strategy: Parallel Run and Decommission
EPC Group does not recommend a big-bang cutover. We prefer a parallel-run approach. In this method, Gen1 and Gen2 dataflows operate at the same time for 2–4 weeks.
During this period, we compare outputs daily using automated validation scripts. We switch downstream consumers in waves. We only decommission the Gen1 dataflow after:
- All consumers are migrated
- Validation passes for 5 consecutive business days
This approach increases capacity costs during the parallel period. However, it removes the risk of data disruption. For regulated industries such as:
- Healthcare
- Financial services
- Government
data accuracy SLAs are contractual. Therefore, a parallel run is not optional; it is a requirement.
Frequently Asked Questions
What is the difference between Dataflow Gen1 and Dataflow Gen2 in Microsoft Fabric?
Dataflow Gen1 runs inside Power BI Service and outputs to Power BI datasets. Dataflow Gen2 runs inside Microsoft Fabric and can output to Lakehouses, Warehouses, KQL databases, or Azure SQL — in addition to Power BI semantic models. Gen2 supports staging lakehouses for intermediate data, uses Fabric compute (CU-based), and integrates with Fabric pipelines for orchestration. Gen2 also adds fast copy for high-volume connectors and improved M query performance through query folding enhancements.
How long does a typical Dataflow Gen1 to Fabric migration take?
For an enterprise with 50–200 dataflows, EPC Group typically completes migration in 6–12 weeks. The timeline depends on M query complexity, custom connector usage, incremental refresh configurations, and downstream dependency mapping. A simple lift-and-shift of compatible dataflows can happen in 2–4 weeks, but enterprises with complex transformations, gateway dependencies, or regulatory requirements should plan for the full 12-week cycle including UAT and parallel-run validation.
Will my existing M queries work in Dataflow Gen2 without changes?
Most standard M queries migrate without modification. However, certain patterns require updates: custom connectors not yet supported in Gen2, queries using Power BI-specific functions (like Value.NativeQuery with gateway-specific syntax), and dataflows that rely on linked or computed entities — which are replaced by staging lakehouses in Gen2. EPC Group runs an automated compatibility scan on every M query before migration to identify breaking changes and estimate remediation effort.
How do I migrate incremental refresh from Gen1 to Gen2?
Incremental refresh in Gen2 works differently. In Gen1, incremental refresh is configured at the dataflow level with RangeStart/RangeEnd parameters. In Gen2, incremental refresh is handled through Fabric pipeline scheduling with watermark columns or through the built-in incremental refresh settings in the dataflow output configuration. You need to re-implement your refresh logic using Gen2's approach — typically by adding date-based filters in the M query and configuring the pipeline schedule to pass dynamic date parameters.
What capacity (SKU) do I need for Dataflow Gen2 in Fabric?
Dataflow Gen2 workloads consume Fabric Capacity Units (CUs). For enterprises running 50–100 dataflows with moderate complexity, an F64 capacity (equivalent to P1) is the starting recommendation — providing 64 CUs for shared use across Data Factory, warehousing, and BI workloads. EPC Group recommends running a 2-week capacity pilot: migrate your top 10 highest-volume dataflows, monitor CU consumption via the Fabric Capacity Metrics app, and extrapolate to size the production SKU. Over-provisioning is safer than under-provisioning during migration.
Ready to Migrate Your Dataflows to Microsoft Fabric?
EPC Group has successfully migrated hundreds of enterprise Dataflow Gen1 environments to Fabric. Our 2-week assessment provides:
- A complete migration plan
- An effort estimate
- A risk register
- Capacity sizing
This ensures you can move with confidence. Call us at (888) 381-9725 or request an assessment below.
Request a Dataflow Migration Assessment