
The complete enterprise guide to migrating Power BI Dataflow Gen1 to Gen2 in Microsoft Fabric. Retirement timeline, breaking changes, step-by-step process, and rollback strategies.
How do you migrate from Dataflow Gen1 to Gen2 in Microsoft Fabric? The migration follows six phases: Discovery & Inventory (catalog all Gen1 dataflows and dependencies), Complexity Assessment (score each dataflow and identify breaking changes), Environment Preparation (provision Fabric workspace, Lakehouse, and gateway connections), Migration Execution (recreate dataflows in Gen2 with explicit data destinations), Testing & Validation (parallel-run Gen1 and Gen2 for 2-4 weeks), and Cutover & Decommission (redirect downstream reports, disable Gen1, archive definitions). The most critical breaking change: Gen2 does not write to CDM folders — all outputs must target a Fabric Lakehouse, Warehouse, or KQL Database. Plan for 4-16 weeks depending on dataflow complexity and volume.
Microsoft's announcement that Dataflow Gen1 is entering deprecation has created urgency for every organization relying on Power BI dataflows for ETL and data preparation. Dataflow Gen2, built natively for Microsoft Fabric, delivers 20-40% faster refresh performance, native Lakehouse integration, and Copilot-assisted query authoring — but the migration is not a simple toggle. Breaking changes around CDM folder outputs, linked entities, and incremental refresh require careful planning.
EPC Group has migrated enterprise dataflow estates with hundreds of Gen1 dataflows across healthcare, financial services, and government organizations. Our structured methodology ensures zero data loss, minimal downstream disruption, and a clean rollback path at every stage. This guide provides the complete framework we use for enterprise clients — from assessment through decommission.
Action Required: Microsoft has placed Dataflow Gen1 in maintenance-only mode. No new features, no new connectors, and no performance improvements will be delivered to Gen1. Organizations should complete migration to Gen2 by Q3 2026 to avoid disruption during the final retirement window. Do not wait for the end-of-life date — migrate proactively while you have the luxury of parallel testing.
Gen2 is a ground-up rebuild for Microsoft Fabric — not an incremental update. Understanding these differences is critical for migration planning.
| Feature | Dataflow Gen1 | Dataflow Gen2 | Status |
|---|---|---|---|
| Compute Engine | Legacy Power Query mashup engine | Enhanced Spark-based compute engine | Improved |
| Data Output | CDM folders in ADLS Gen2 or internal storage | Lakehouse, Warehouse, KQL Database (explicit destinations) | Improved |
| Staging | No staging — all in-memory | Auto-created staging lakehouse for intermediate results | Improved |
| Fast Copy | Not available | High-throughput ingestion for large datasets | New |
| Linked Entities | Supported (CDM-based entity references) | Use Lakehouse tables as shared data sources | Changed |
| Incremental Refresh | Supported (partition-based) | Supported (updated partition management) | Changed |
| Monitoring | Power BI refresh history only | Fabric Monitor hub + detailed Spark job logs | Improved |
| Pipeline Integration | Limited (Power BI pipeline API) | Native Fabric data pipeline orchestration | Improved |
| AI/Copilot | Not available | Copilot-assisted query authoring | New |
| Refresh Performance | Baseline | 20-40% faster for equivalent workloads | Improved |
Dataflow Gen1 was built for Power BI Service as a standalone data preparation tool. It predates Microsoft Fabric and was never designed for the unified lakehouse architecture that Fabric provides. Microsoft's retirement of Gen1 reflects three strategic imperatives:
Gen2 is native to Fabric, writing directly to Lakehouse and Warehouse — eliminating the CDM folder intermediary that Gen1 required. This enables a single data copy shared across all Fabric workloads.
Gen2 leverages Spark-based compute with staging lakehouse persistence, fast copy for high-volume ingestion, and parallel query execution — capabilities impossible to retrofit into Gen1's legacy mashup engine.
Gen2 integrates Copilot-assisted query authoring and positions dataflows as a first-class citizen in Fabric's AI-powered analytics stack. Gen1 cannot support these capabilities.
The retirement is not punitive — Gen2 is genuinely superior in every measurable dimension. But the migration requires deliberate effort, especially for organizations with complex dataflow dependencies, linked entities, and incremental refresh configurations. Treating this as a “lift and shift” will result in broken pipelines and data gaps.
No new features, connectors, or performance improvements. Security patches continue.
Microsoft begins recommending Gen2 for all new dataflow development. Gen1 creation still available but flagged.
Both Gen1 and Gen2 operate side by side. This is the optimal migration window with full parallel-testing capability.
Complete all Gen1-to-Gen2 migrations by Q3 2026 to allow buffer before anticipated final retirement.
Microsoft is expected to announce the final Gen1 shutdown date. Organizations still on Gen1 will face forced migration under time pressure.
Every Gen1 dataflow falls into one of three complexity tiers. Accurate classification drives realistic timelines and resource allocation.
1-2 hours per dataflow
Approach
Use "Save as Gen2" if available, or quick recreate in Gen2 authoring experience.
4-8 hours per dataflow
Approach
Recreate in Gen2, configure Lakehouse destinations, test transformation parity.
1-3 days per dataflow
Approach
Full dependency analysis, Lakehouse table intermediary, extended parallel testing.
These are the issues that break production pipelines when organizations attempt a “quick migration” without proper assessment. Each requires specific remediation.
Gen2 does not write to CDM folders. All outputs must target a Fabric Lakehouse, Warehouse, or KQL Database.
Remediation
Configure explicit data destinations in Gen2. If downstream systems consume CDM folders directly, create a Lakehouse shortcut or pipeline to replicate the data.
Gen1 linked entities (references between dataflows) are not automatically migrated. Gen2 does not use the same CDM-based entity linking.
Remediation
Write the source dataflow output to a Lakehouse table, then reference that table from dependent Gen2 dataflows. This is more performant than Gen1 linked entities.
Gen2 uses updated partition management. Existing Gen1 incremental refresh policies must be reconfigured in Gen2.
Remediation
Recreate incremental refresh parameters (RangeStart, RangeEnd) in Gen2 and configure the refresh policy. Test partition creation across several refresh cycles.
Power BI datasets using DirectQuery to Gen1 dataflow entities must be redirected to Gen2 data destinations.
Remediation
After Gen2 dataflow writes to Lakehouse/Warehouse, update the dataset connection to point to the new Lakehouse SQL endpoint or Warehouse.
Some Power Query connectors behave differently under the Gen2 enhanced compute engine. Query folding patterns may change.
Remediation
Test each connector under Gen2 compute. Monitor the staging lakehouse for unexpected query materialization. Adjust query folding hints if needed.
Gen2 refresh timing may differ from Gen1 due to the enhanced compute engine and staging lakehouse writes.
Remediation
Benchmark Gen2 refresh durations during parallel testing. Adjust downstream dependency schedules if Gen2 refreshes complete faster or slower than Gen1.
Beyond the mandatory migration, Gen2 delivers substantial performance gains that justify early adoption. These improvements are not incremental — they reflect a fundamentally different compute architecture.
The enhanced Spark-based compute engine processes transformations significantly faster than Gen1's legacy mashup engine. Organizations with large-volume dataflows (1M+ rows) see the most dramatic improvements, with some reporting 50%+ reduction in refresh duration.
Gen2 introduces Fast Copy mode for data ingestion scenarios where minimal transformation is needed. Fast Copy bypasses row-by-row processing and uses high-throughput bulk ingestion, reducing load times for large datasets from hours to minutes.
Gen2 auto-creates a staging lakehouse that persists intermediate query results. In complex dataflows with multiple transformation steps, this eliminates redundant re-computation. Subsequent queries reference staged data instead of re-executing upstream transformations.
Gen2 writes directly to Delta tables in Fabric Lakehouse or Warehouse, eliminating the CDM serialization/deserialization overhead that Gen1 required. This reduces both write time and storage costs while enabling direct consumption by Spark notebooks, SQL endpoints, and Power BI DirectLake mode.
Gen2 surfaces detailed Spark job logs, query execution plans, and resource utilization metrics through the Fabric Monitor hub. This visibility enables targeted performance optimization that was impossible with Gen1's limited refresh history.
The most compelling reason to migrate is not performance — it is Fabric integration. Gen2 dataflows become first-class citizens in the Fabric ecosystem, unlocking capabilities that Gen1 cannot access.
Gen2 outputs land directly in OneLake, making them immediately available to every Fabric workload — notebooks, SQL endpoints, real-time analytics, and Power BI DirectLake mode. No data movement or duplication required.
Gen2 dataflows integrate natively with Fabric data pipelines, enabling sophisticated orchestration with notebooks, stored procedures, and other dataflows in a single pipeline with dependency management.
Gen2 data destinations inherit Microsoft Purview governance — automatic data classification, sensitivity labels, lineage tracking, and access policies. Gen1 CDM folders were outside this governance perimeter.
Gen2 leverages Fabric workspace roles and Entra ID integration for fine-grained access control. Data destinations support row-level security through Lakehouse SQL endpoints and Warehouse views.
EPC Group Recommendation: Organizations already using or planning to adopt Microsoft Fabric should treat the Gen1-to-Gen2 migration as a Fabric onboarding workstream — not a standalone project. This ensures dataflow migration aligns with Lakehouse design, governance policies, and capacity planning.
Parallel testing is non-negotiable for enterprise migrations. Every migrated dataflow must pass these validation gates before Gen1 is decommissioned.
Every enterprise migration requires a clean rollback path. EPC Group's methodology preserves Gen1 dataflows throughout the migration and validation period, ensuring zero-risk cutover.
Never delete Gen1 dataflows during migration
Disable Gen1 refresh schedules after cutover, but keep the dataflow definitions intact for a minimum of 30 days post-migration.
Maintain dual refresh schedules during validation
Run both Gen1 and Gen2 dataflows for 2-4 weeks. Compare outputs daily. Only disable Gen1 after all validation gates pass.
Preserve downstream connection strings
Document all Gen1 dataset connection strings before redirecting to Gen2. Store these in a migration runbook for instant rollback.
Archive Gen1 dataflow M code
Export the Power Query M code from every Gen1 dataflow before migration. Store in version control (Git) for audit trail and rollback recreation.
Define rollback triggers
Establish clear criteria for initiating rollback: data accuracy below 99.9%, refresh duration exceeding 2x Gen1 baseline, or any downstream report failures.
Test the rollback procedure
Before final cutover, practice rolling back one migrated dataflow to Gen1. Verify the rollback completes within your RTO (recovery time objective).
Enterprise Fabric implementation, migration, and optimization from EPC Group's certified Fabric architects.
End-to-end Power BI strategy, deployment, and governance for Fortune 500 organizations.
Comprehensive guide to Fabric architecture, OneLake, capacity planning, and enterprise deployment.
Migration from Dataflow Gen1 to Gen2 follows a structured process: 1) Inventory all existing Gen1 dataflows and document their refresh schedules, data sources, and downstream dependencies. 2) Use the built-in "Save as Gen2" option in Power BI Service for simple dataflows. 3) For complex dataflows, recreate queries in the Gen2 authoring experience, mapping Power Query transformations to the new enhanced compute engine. 4) Update data destination configurations — Gen2 uses explicit data destinations (Lakehouse, Warehouse, KQL Database) instead of Gen1 CDM folders. 5) Test refresh performance and data accuracy against Gen1 baselines. 6) Redirect downstream reports and datasets to point to Gen2 outputs. 7) Decommission Gen1 dataflows after validation. EPC Group provides a migration assessment and execution service for enterprise-scale dataflow estates.
Microsoft announced that Dataflow Gen1 will reach end of support in phases. New Gen1 dataflow creation is being deprecated throughout 2025-2026, with existing Gen1 dataflows entering maintenance-only mode. Microsoft has stated that Gen1 will not receive new features and will eventually be retired entirely, with the final retirement date expected by late 2026 or early 2027. Organizations should begin migration planning immediately — EPC Group recommends completing all Gen1-to-Gen2 migrations by Q3 2026 to avoid disruption during the final retirement window.
Dataflow Gen1 operates within Power BI Service and writes output to CDM (Common Data Model) folders in Azure Data Lake Storage Gen2 or to Power BI internal storage. Gen2, built natively for Microsoft Fabric, introduces several major improvements: 1) Direct data destinations — output goes to Fabric Lakehouse, Warehouse, or KQL Database instead of CDM folders. 2) Enhanced compute engine with faster Spark-based processing. 3) Staging lakehouse for intermediate query results (auto-created). 4) Fast copy for high-volume data ingestion. 5) Native integration with Fabric data pipelines and notebooks. 6) Improved monitoring through Fabric Monitor hub. Gen2 is approximately 20-40% faster than Gen1 for equivalent workloads.
Existing Gen1 dataflows will continue to function during the deprecation period, but they are in maintenance-only mode — Microsoft will not add new features, connectors, or performance improvements. Critical bug fixes and security patches will continue until the final retirement date. However, organizations should not rely on indefinite Gen1 availability. Microsoft deprecation timelines for Power BI features typically give 12-18 months notice before final shutdown. EPC Group recommends treating Gen1 as end-of-life now and prioritizing migration to avoid last-minute disruption to production refresh schedules.
Common breaking changes during Gen1-to-Gen2 migration: 1) CDM folder outputs are not supported in Gen2 — you must configure explicit data destinations (Lakehouse, Warehouse). 2) Linked entities between Gen1 dataflows do not automatically transfer — you must recreate entity references in Gen2 or use Lakehouse tables as shared data sources. 3) Incremental refresh configuration differs — Gen2 uses a different partition management approach. 4) Some Power Query connectors may behave differently under the Gen2 enhanced compute engine. 5) Downstream Power BI datasets referencing Gen1 dataflow entities via DirectQuery must be redirected to Gen2 data destinations. 6) Scheduled refresh timing may shift due to Gen2 performance differences. EPC Group migration assessment identifies all breaking changes before migration begins.
Migration duration depends on the complexity and number of dataflows: Small environments (1-10 simple dataflows): 1-2 weeks including testing and validation. Medium environments (10-50 dataflows with moderate transformations): 3-6 weeks including dependency mapping, migration, testing, and cutover. Large enterprise environments (50+ dataflows with complex transformations, linked entities, incremental refresh, and many downstream dependencies): 8-16 weeks following EPC Group structured methodology. The most time-consuming phase is not the technical migration itself — it is dependency mapping and downstream impact analysis. Organizations with undocumented dataflow dependencies typically require 40-60% more time for migration.
Yes, Gen1 and Gen2 dataflows can run simultaneously in the same Power BI workspace. This is the recommended approach for enterprise migration — run Gen2 dataflows in parallel with Gen1 for a validation period (typically 2-4 weeks) before decommissioning Gen1. During parallel operation, compare data outputs, refresh performance, and downstream report accuracy between Gen1 and Gen2 versions. This approach eliminates risk by providing a clean rollback path — if Gen2 produces unexpected results, Gen1 remains fully operational. EPC Group migration methodology always includes a parallel-run validation phase.
Dataflow Gen2 delivers significant performance improvements over Gen1: 1) Enhanced compute engine — Spark-based processing replaces the legacy mashup engine for supported transformations, delivering 20-40% faster refresh times. 2) Fast copy — high-throughput data ingestion for large datasets bypasses row-by-row processing. 3) Staging lakehouse — intermediate query results are persisted to a staging lakehouse, reducing redundant computation in multi-step transformations. 4) Native Fabric integration — data writes directly to Lakehouse/Warehouse Delta tables, eliminating the CDM serialization overhead of Gen1. 5) Parallel query execution — Gen2 parallelizes independent queries more aggressively than Gen1. Organizations with large-volume dataflows (millions of rows) typically see the most dramatic improvements.
Yes. EPC Group offers a complimentary Dataflow Gen1-to-Gen2 Migration Assessment for organizations with 10 or more dataflows. The assessment includes: 1) Complete inventory of all Gen1 dataflows across your Power BI tenant. 2) Dependency mapping — downstream datasets, reports, and dashboards affected by each dataflow. 3) Complexity scoring for each dataflow (simple, moderate, complex) based on transformations, linked entities, and incremental refresh usage. 4) Breaking change identification — specific issues that will require remediation during migration. 5) Estimated migration timeline and resource requirements. 6) Risk assessment and recommended rollback strategy. Contact EPC Group at inquiry@epcgroup.net or call 888-381-9725 to schedule your assessment.
EPC Group will inventory your Gen1 dataflows, map downstream dependencies, identify breaking changes, and deliver a prioritized migration plan with effort estimates — at no cost for organizations with 10+ dataflows. Our Microsoft-certified Fabric architects have migrated enterprise dataflow estates for healthcare, financial services, and government clients.