Power BI Dataflows: The Enterprise Guide to Self-Service Data Preparation, Gen2, and Shared Dataflows
Power BI Dataflows are the foundation of scalable enterprise analytics, enabling centralized, governed, reusable data preparation that eliminates duplicate transformation logic across reports. This guide covers the complete Dataflow architecture -- Gen1 vs. Gen2, incremental refresh configuration, linked and computed entities, Microsoft Fabric lakehouse integration, governance frameworks, and implementation strategies -- based on 150+ enterprise deployments by EPC Group.
Power BI Dataflows Enterprise Guide 2026
Last updated: 2026 · Read time: 10 min
Power BI Dataflows are the centralized data preparation layer between source systems and Power BI semantic models. Dataflows Gen2 in Microsoft Fabric extends this with OneLake output, Fabric compute, and 150+ connectors. This guide covers Gen1 vs. Gen2 architecture, incremental refresh, linked entities, governance, and EPC Group's implementation patterns from 150+ deployments.
Key facts
- EPC Group: 150+ enterprise Power BI Dataflow implementations across healthcare, financial services, education, and government.
- Microsoft Gold Partner (2003-2022) 2003–2022 (oldest continuous in North America). Now Microsoft Solutions Partner.
- Incremental refresh reduces average enterprise Dataflow refresh times from 35 minutes to 4 minutes (88% reduction).
- Incremental refresh reduces source system query load by 90%.
- Dataflows Gen2 outputs to any Fabric destination: lakehouse, warehouse, or KQL database — not just CDM folders.
What Power BI Dataflows do
Without Dataflows, every Power BI report author connects directly to source systems and applies their own ad-hoc transformations. This creates inconsistency, duplicated logic, and source system overload.
With Dataflows, a central team extracts data from sources, applies standardized business logic, and stores the results in a certified location. Any report then consumes the pre-transformed, certified data — not the raw source.
Gen1 vs. Gen2: key differences
The critical difference is destination flexibility. Choose your generation based on your current platform (Power BI Premium vs. Microsoft Fabric).
- Gen1 output — writes to CDM (Common Data Model) folders. Consumed by Power BI datasets via import.
- Gen2 output — writes directly to Fabric lakehouse Delta tables. Available immediately via Spark notebooks, SQL analytics endpoint, and Power BI Direct Lake.
- Gen2 compute — uses Fabric compute engines for faster transformation performance.
- Gen2 orchestration — integrates with Fabric data pipelines for complex dependency management.
- Gen2 connectors — 150+ data connectors including all Gen1 sources plus Fabric-native sources.
Incremental refresh: the enterprise performance lever
Incremental refresh is the highest-impact Dataflow configuration for enterprise deployments. Configure it for any Dataflow that refreshes data over 1 GB or runs more than twice daily.
- Refresh time reduction: from 35 minutes to 4 minutes average (88% reduction based on EPC Group deployments).
- Source system load reduction: 90% reduction in source system queries per refresh cycle.
- Refresh frequency: incremental refresh enables schedules as frequent as every 30 minutes for near-real-time scenarios.
- Configuration: define a rolling window (e.g., keep 3 years of data, refresh only the last 14 days on each run).
Linked entities and shared Dataflows
Linked entities let multiple Dataflows consume the output of a single "certified" Dataflow without re-running the transformation. This is the Dataflow equivalent of a certified semantic model.
- Certified Dataflow — master transformation owned by the CoE. Refreshes once daily from source.
- Linked Dataflow — department-level Dataflow that reads from the certified Dataflow and applies department-specific filters or calculations without re-querying the source.
- Computed entities — transform Linked entity output further using Power Query in a child Dataflow. Enables 3-tier architectures (Bronze → Silver → Gold).
Dataflows governance framework
Ungoverned Dataflows create the same problems as ungoverned Power BI reports — duplication, inconsistency, and orphaned transformations with no owner.
- Every Dataflow must have a named owner and a documented refresh schedule.
- Use workspace separation — Certified Dataflows in a governed workspace, experimental Dataflows in a dev workspace.
- Enable endorsement — mark production Dataflows as "Certified" in the Power BI service.
- Quarterly audit — identify Dataflows with no downstream dependents and decommission them.
- Connect to Microsoft Purview — Dataflow lineage surfaces in the Purview Data Catalog for regulated environments.
Dataflows vs. Azure Data Factory
Both tools move and transform data. The choice depends on complexity and persona.
- Power BI Dataflows — designed for Power BI-centric transformations. Power Query UI. No code required. Best for analysts who own their data preparation.
- Azure Data Factory (ADF) — designed for enterprise-scale ETL with complex orchestration, error handling, and multi-system dependencies. Best for data engineering teams building production pipelines.
- Fabric Data Factory — the 2026 replacement for ADF in Fabric environments. Same capabilities, unified with OneLake governance.
Many enterprise deployments use both: ADF for source-to-landing-zone pipelines, and Dataflows Gen2 for the landing-zone-to-semantic-model transformation layer.
Frequently asked questions
What is the difference between Power BI Dataflows and datasets?
Dataflows prepare and store data in a shared data store (CDM folder or OneLake). Datasets (semantic models) consume that prepared data to build the analytical layer (measures, relationships, RLS). Dataflows are the ETL layer. Datasets are the analytical layer.
Do I need Dataflows if I already have Azure Data Factory?
You can use both. ADF handles source system extraction and complex orchestration. Dataflows handle the final transformation into Power Query-compatible format. Many enterprises use ADF to land data in Azure SQL or OneLake, then Dataflows for the Power Query transformations consumed by semantic models.
What is Dataflows Gen2 and do I need Fabric to use it?
Dataflows Gen2 is available in Microsoft Fabric. It requires a Fabric capacity (F2+). Gen1 Dataflows continue to work in Power BI Premium workspaces without Fabric. Move to Gen2 when you need to output data directly to a Fabric lakehouse or warehouse.
How do I handle schema changes in source systems?
Configure data type detection to "Do not detect column types" when connecting to dynamic schemas. Add explicit type conversion steps in Power Query after source connection. Monitor Dataflow refresh failure logs — schema changes appear as type mismatch errors before they break reports.
Schedule a Dataflows architecture review
EPC Group has completed 150+ enterprise Power BI Dataflow implementations. Talk to an architect about Dataflows Gen1 vs. Gen2 migration, incremental refresh configuration, or governance framework design. Call (888) 381-9725 or request a 30-minute discovery call.
Frequently Asked Questions
What are Power BI Dataflows and why should enterprises use them?
Power BI Dataflows are a self-service data preparation technology that enables business analysts and data engineers to extract, transform, and load (ETL) data using Power Query Online without writing code. Dataflows store transformed data in Azure Data Lake Storage Gen2 (Common Data Model format) or in a Microsoft Fabric lakehouse, making the data reusable across multiple Power BI datasets, reports, and other Azure services. For enterprises, Dataflows solve a critical problem: without them, every Power BI report author duplicates data transformation logic, leading to inconsistent business definitions, wasted processing resources, and maintenance nightmares. With Dataflows, you define the transformation once and every downstream report consumes the same certified data. EPC Group has implemented enterprise Dataflow architectures for 150+ organizations, typically reducing data preparation effort by 60% and eliminating business logic inconsistencies across reports.
What is the difference between Dataflows Gen1 and Dataflows Gen2?
Dataflows Gen1 is the original Power BI service feature that transforms data using Power Query Online and stores results in Azure Data Lake Storage Gen2 in CDM format. Dataflows Gen2 is the next-generation version in Microsoft Fabric that adds significant capabilities: output to any Fabric destination (lakehouse, warehouse, KQL database), faster performance through Fabric compute engines, data pipeline integration for orchestration, staging lakehouse for intermediate data, and 150+ data connectors. The key architectural difference is destination flexibility: Gen1 outputs only to CDM folders, while Gen2 lands data directly in Fabric lakehouse Delta tables for immediate availability via Spark notebooks, SQL analytics endpoints, and Power BI Direct Lake datasets. EPC Group recommends Gen2 for all new implementations and provides migration services for organizations moving from Gen1 to Gen2.
How does incremental refresh work in Power BI Dataflows?
Incremental refresh optimizes data refresh by only processing new or changed data rather than reloading the entire dataset. The Dataflow partitions data by a date/time column and maintains a rolling window: during refresh, only recent partitions (the refresh period) are reprocessed while historical partitions (the archive period) remain untouched. Configuration involves defining RangeStart and RangeEnd parameters in Power Query, filtering the source query using these parameters, and specifying archive and refresh periods. For example, a Dataflow with 3 years of sales data and a 7-day refresh period only refreshes the last 7 days on each run, reducing refresh time from 45 minutes to 3 minutes. EPC Group implements incremental refresh for all enterprise Dataflows processing more than 1 million rows, typically reducing refresh times by 80-95%.
What are linked entities and computed entities in Power BI Dataflows?
Linked entities and computed entities are enterprise features (requiring Premium or Fabric capacity) that enable data reuse without duplication. A linked entity references an entity from another Dataflow, consuming its output without re-extracting from the source system. A computed entity references other entities within the same Dataflow for further transformation, using the enhanced compute engine (SQL-backed) for dramatically faster joins, aggregations, and complex operations. The enterprise pattern is: Layer 1 Dataflows extract raw data, Layer 2 Dataflows use linked entities to reference Layer 1 and apply business transformations, and Layer 3 Dataflows use computed entities for final metrics. EPC Group implements this layered architecture to reduce total processing time by 50% and create a reusable data preparation layer.
How do Power BI Dataflows integrate with Microsoft Fabric lakehouses?
In Microsoft Fabric, Dataflows Gen2 output directly to lakehouse Delta tables. The Dataflow connects to source systems via Power Query Online, applies transformations, and writes results as Delta format to the lakehouse. The data is immediately available via SQL analytics endpoints for T-SQL queries, Spark notebooks for data science, and Power BI Direct Lake mode for zero-copy analytics. This integration enables business analysts to contribute to the enterprise lakehouse without learning Spark or Python. EPC Group designs hybrid architectures where Dataflows Gen2 handle structured business data ingestion while Spark notebooks handle complex data engineering, creating a unified lakehouse serving all analytics needs.
How much do Power BI Dataflows cost and what licensing is required?
Basic Dataflows Gen1 are available with Power BI Pro ($10/user/month) but with limitations: no linked entities, no computed entities, no enhanced compute engine. Enterprise features require Power BI Premium ($4,995/month for P1 capacity) or Premium Per User ($20/user/month). Dataflows Gen2 in Fabric require Fabric capacity (F2 starting at approximately $260/month, F64 at approximately $8,300/month). For a 500-user enterprise analytics team, EPC Group typically recommends Fabric F64 ($8,300/month) providing compute for 50-100 Dataflows with daily refresh. Total annual cost: approximately $100,000 for capacity plus $60,000 for Pro licenses. Implementation services range from $50,000 to $150,000 depending on data source complexity.
