February 27, 2026|24 min read|Power BI Consulting

Power BI Dataflows: The Enterprise Guide to Self-Service Data Preparation, Gen2, and Shared Dataflows

Power BI Dataflows are essential for scalable enterprise analytics. They allow for centralized, governed, and reusable data preparation. This helps eliminate duplicate transformation logic across reports.

This guide covers the complete Dataflow architecture, including:

Gen1 vs. Gen2
Incremental refresh configuration
Linked and computed entities
Microsoft Fabric lakehouse integration
Governance frameworks
Implementation strategies

These insights are based on over 150 enterprise deployments by EPC Group.

Power BI Dataflows Enterprise Guide 2026

Last updated: 2026 · Read time: 10 min

Power BI Dataflows act as the central layer for data preparation. They connect source systems to Power BI semantic models.

Dataflows Gen2 in Microsoft Fabric improve this process with:

OneLake output
Fabric compute
Over 150 connectors

This guide discusses the following topics:

Gen1 vs. Gen2 architecture
Incremental refresh
Linked entities
Governance
EPC Group's implementation patterns from over 150 deployments

Key facts

EPC Group: 150+ enterprise Power BI Dataflow implementations across healthcare, financial services, education, and government.
Microsoft Gold Partner (2000–2022) (oldest continuous in North America). Now Microsoft Solutions Partner.
Incremental refresh reduces average enterprise Dataflow refresh times from 35 minutes to 4 minutes (88% reduction).
Incremental refresh reduces source system query load by 90%.
Dataflows Gen2 outputs to any Fabric destination: lakehouse, warehouse, or KQL database — not just CDM folders.

What Power BI Dataflows do

Without Dataflows, every Power BI report author connects directly to source systems and applies their own ad-hoc transformations. This creates inconsistency, duplicated logic, and source system overload.

Dataflows enable a central team to gather data from different sources. They apply consistent business logic and save the results in a certified location. Reports then utilize this pre-transformed, certified data instead of the raw source.

Gen1 vs. Gen2: key differences

The critical difference is destination flexibility. Choose your generation based on your current platform (Power BI Premium vs. Microsoft Fabric).

Gen1 output — writes to CDM (Common Data Model) folders. Consumed by Power BI datasets via import.
Gen2 output — writes directly to Fabric lakehouse Delta tables. Available immediately via Spark notebooks, SQL analytics endpoint, and Power BI Direct Lake.
Gen2 compute — uses Fabric compute engines for faster transformation performance.
Gen2 orchestration — integrates with Fabric data pipelines for complex dependency management.
Gen2 connectors — 150+ data connectors including all Gen1 sources plus Fabric-native sources.

Incremental refresh: the enterprise performance lever

Incremental refresh is the highest-impact Dataflow configuration for enterprise deployments. Configure it for any Dataflow that refreshes data over 1 GB or runs more than twice daily.

Refresh time reduction: from 35 minutes to 4 minutes average (88% reduction based on EPC Group deployments).
Source system load reduction: 90% reduction in source system queries per refresh cycle.
Refresh frequency: incremental refresh enables schedules as frequent as every 30 minutes for near-real-time scenarios.
Configuration: define a rolling window (e.g., keep 3 years of data, refresh only the last 14 days on each run).

Linked entities and shared Dataflows

Linked entities let multiple Dataflows consume the output of a single "certified" Dataflow without re-running the transformation. This is the Dataflow equivalent of a certified semantic model.

Certified Dataflow — master transformation owned by the CoE. Refreshes once daily from source.
Linked Dataflow — department-level Dataflow that reads from the certified Dataflow and applies department-specific filters or calculations without re-querying the source.
Computed entities — transform Linked entity output further using Power Query in a child Dataflow. Enables 3-tier architectures (Bronze → Silver → Gold).

Dataflows governance framework

Ungoverned Dataflows create the same problems as ungoverned Power BI reports — duplication, inconsistency, and orphaned transformations with no owner.

Every Dataflow must have a named owner and a documented refresh schedule.
Use workspace separation — Certified Dataflows in a governed workspace, experimental Dataflows in a dev workspace.
Enable endorsement — mark production Dataflows as "Certified" in the Power BI service.
Quarterly audit — identify Dataflows with no downstream dependents and decommission them.
Connect to Microsoft Purview — Dataflow lineage surfaces in the Purview Data Catalog for regulated environments.

Dataflows vs. Azure Data Factory

Both tools move and transform data. The choice depends on complexity and persona.

Power BI Dataflows — designed for Power BI-centric transformations. Power Query UI. No code required. Best for analysts who own their data preparation.
Azure Data Factory (ADF) — designed for enterprise-scale ETL with complex orchestration, error handling, and multi-system dependencies. Best for data engineering teams building production pipelines.
Fabric Data Factory — the 2026 replacement for ADF in Fabric environments. Same capabilities, unified with OneLake governance.

Many enterprise deployments use both: ADF for source-to-landing-zone pipelines, and Dataflows Gen2 for the landing-zone-to-semantic-model transformation layer.

Frequently asked questions

What is the difference between Power BI Dataflows and datasets?

Dataflows prepare and store data in a shared data store, like a CDM folder or OneLake. Datasets are semantic models that use this prepared data to build the analytical layer. This layer includes:

Data transformation
Data integration
Data visualization

Data integration
Data transformation
Data visualization

Measures
Relationships
RLS

Dataflows: The ETL layer.
Datasets: The analytical layer.

Do I need Dataflows if I already have Azure Data Factory?

You can use both Azure Data Factory (ADF) and Dataflows for your data processes. ADF extracts data from source systems and manages complex orchestration. Dataflows transform data into a format that works with Power Query.

Many enterprises use ADF to:

Land data in Azure SQL
Land data in OneLake

They then use Dataflows for the Power Query transformations that are consumed by semantic models.

What is Dataflows Gen2 and do I need Fabric to use it?

Dataflows Gen2 is part of Microsoft Fabric. It requires a Fabric capacity of F2 or higher.

Gen1 Dataflows still function in Power BI Premium workspaces without Fabric.

Consider moving to Gen2 if you need to output data directly to a Fabric lakehouse or warehouse.

How do I handle schema changes in source systems?

When connecting to dynamic schemas, set data type detection to "Do not detect column types." This helps prevent issues with schema changes.

After you establish the source connection, include clear type conversion steps in Power Query. It is also important to monitor Dataflow refresh failure logs. Schema changes can cause type mismatch errors, which may disrupt reports.

Schedule a Dataflows architecture review

EPC Group has successfully completed over 150 enterprise Power BI Dataflow implementations. Our team is ready to assist you with:

Dataflows Gen1 vs. Gen2 migration
Incremental refresh configuration
Governance framework design

Call (888) 381-9725 or request a 30-minute discovery call.

Frequently Asked Questions

What are Power BI Dataflows and why should enterprises use them?

Power BI Dataflows are a self-service data preparation technology that enables business analysts and data engineers to extract, transform, and load (ETL) data using Power Query Online without writing code. Dataflows store transformed data in Azure Data Lake Storage Gen2 (Common Data Model format) or in a Microsoft Fabric lakehouse, making the data reusable across multiple Power BI datasets, reports, and other Azure services. For enterprises, Dataflows solve a critical problem: without them, every Power BI report author duplicates data transformation logic, leading to inconsistent business definitions, wasted processing resources, and maintenance nightmares. With Dataflows, you define the transformation once and every downstream report consumes the same certified data. EPC Group has implemented enterprise Dataflow architectures for 150+ organizations, typically reducing data preparation effort by 60% and eliminating business logic inconsistencies across reports.

What is the difference between Dataflows Gen1 and Dataflows Gen2?

Dataflows Gen1 is the original Power BI service feature that transforms data using Power Query Online and stores results in Azure Data Lake Storage Gen2 in CDM format. Dataflows Gen2 is the next-generation version in Microsoft Fabric that adds significant capabilities: output to any Fabric destination (lakehouse, warehouse, KQL database), faster performance through Fabric compute engines, data pipeline integration for orchestration, staging lakehouse for intermediate data, and 150+ data connectors. The key architectural difference is destination flexibility: Gen1 outputs only to CDM folders, while Gen2 lands data directly in Fabric lakehouse Delta tables for immediate availability via Spark notebooks, SQL analytics endpoints, and Power BI Direct Lake datasets. EPC Group recommends Gen2 for all new implementations and provides migration services for organizations moving from Gen1 to Gen2.

How does incremental refresh work in Power BI Dataflows?

Incremental refresh optimizes data refresh by only processing new or changed data rather than reloading the entire dataset. The Dataflow partitions data by a date/time column and maintains a rolling window: during refresh, only recent partitions (the refresh period) are reprocessed while historical partitions (the archive period) remain untouched. Configuration involves defining RangeStart and RangeEnd parameters in Power Query, filtering the source query using these parameters, and specifying archive and refresh periods. For example, a Dataflow with 3 years of sales data and a 7-day refresh period only refreshes the last 7 days on each run, reducing refresh time from 45 minutes to 3 minutes. EPC Group implements incremental refresh for all enterprise Dataflows processing more than 1 million rows, typically reducing refresh times by 80-95%.

What are linked entities and computed entities in Power BI Dataflows?

Linked entities and computed entities are enterprise features (requiring Premium or Fabric capacity) that enable data reuse without duplication. A linked entity references an entity from another Dataflow, consuming its output without re-extracting from the source system. A computed entity references other entities within the same Dataflow for further transformation, using the enhanced compute engine (SQL-backed) for dramatically faster joins, aggregations, and complex operations. The enterprise pattern is: Layer 1 Dataflows extract raw data, Layer 2 Dataflows use linked entities to reference Layer 1 and apply business transformations, and Layer 3 Dataflows use computed entities for final metrics. EPC Group implements this layered architecture to reduce total processing time by 50% and create a reusable data preparation layer.

How do Power BI Dataflows integrate with Microsoft Fabric lakehouses?

In Microsoft Fabric, Dataflows Gen2 output directly to lakehouse Delta tables. The Dataflow connects to source systems via Power Query Online, applies transformations, and writes results as Delta format to the lakehouse. The data is immediately available via SQL analytics endpoints for T-SQL queries, Spark notebooks for data science, and Power BI Direct Lake mode for zero-copy analytics. This integration enables business analysts to contribute to the enterprise lakehouse without learning Spark or Python. EPC Group designs hybrid architectures where Dataflows Gen2 handle structured business data ingestion while Spark notebooks handle complex data engineering, creating a unified lakehouse serving all analytics needs.

How much do Power BI Dataflows cost and what licensing is required?

Basic Dataflows Gen1 are available with Power BI Pro ($10/user/month) but with limitations: no linked entities, no computed entities, no enhanced compute engine. Enterprise features require Power BI Premium ($4,995/month for P1 capacity) or Premium Per User ($20/user/month). Dataflows Gen2 in Fabric require Fabric capacity (F2 starting at approximately $260/month, F64 at approximately $8,300/month). For a 500-user enterprise analytics team, EPC Group typically recommends Fabric F64 ($8,300/month) providing compute for 50-100 Dataflows with daily refresh. Total annual cost: approximately $100,000 for capacity plus $60,000 for Pro licenses. Implementation services range from $50,000 to $150,000 depending on data source complexity.

February 27, 2026|24 min read|Power BI Consulting

Power BI Dataflows: The Enterprise Guide to Self-Service Data Preparation, Gen2, and Shared Dataflows

This guide covers the complete Dataflow architecture, including:

Gen1 vs. Gen2
Incremental refresh configuration
Linked and computed entities
Microsoft Fabric lakehouse integration
Governance frameworks
Implementation strategies

These insights are based on over 150 enterprise deployments by EPC Group.

Power BI Dataflows Enterprise Guide 2026

Last updated: 2026 · Read time: 10 min

Power BI Dataflows act as the central layer for data preparation. They connect source systems to Power BI semantic models.

Dataflows Gen2 in Microsoft Fabric improve this process with:

OneLake output
Fabric compute
Over 150 connectors

This guide discusses the following topics:

Gen1 vs. Gen2 architecture
Incremental refresh
Linked entities
Governance
EPC Group's implementation patterns from over 150 deployments

Key facts

EPC Group: 150+ enterprise Power BI Dataflow implementations across healthcare, financial services, education, and government.
Microsoft Gold Partner (2000–2022) (oldest continuous in North America). Now Microsoft Solutions Partner.
Incremental refresh reduces average enterprise Dataflow refresh times from 35 minutes to 4 minutes (88% reduction).
Incremental refresh reduces source system query load by 90%.
Dataflows Gen2 outputs to any Fabric destination: lakehouse, warehouse, or KQL database — not just CDM folders.

What Power BI Dataflows do

Gen1 vs. Gen2: key differences

The critical difference is destination flexibility. Choose your generation based on your current platform (Power BI Premium vs. Microsoft Fabric).

Gen1 output — writes to CDM (Common Data Model) folders. Consumed by Power BI datasets via import.
Gen2 output — writes directly to Fabric lakehouse Delta tables. Available immediately via Spark notebooks, SQL analytics endpoint, and Power BI Direct Lake.
Gen2 compute — uses Fabric compute engines for faster transformation performance.
Gen2 orchestration — integrates with Fabric data pipelines for complex dependency management.
Gen2 connectors — 150+ data connectors including all Gen1 sources plus Fabric-native sources.

Incremental refresh: the enterprise performance lever

Incremental refresh is the highest-impact Dataflow configuration for enterprise deployments. Configure it for any Dataflow that refreshes data over 1 GB or runs more than twice daily.

Refresh time reduction: from 35 minutes to 4 minutes average (88% reduction based on EPC Group deployments).
Source system load reduction: 90% reduction in source system queries per refresh cycle.
Refresh frequency: incremental refresh enables schedules as frequent as every 30 minutes for near-real-time scenarios.
Configuration: define a rolling window (e.g., keep 3 years of data, refresh only the last 14 days on each run).

Linked entities and shared Dataflows

Linked entities let multiple Dataflows consume the output of a single "certified" Dataflow without re-running the transformation. This is the Dataflow equivalent of a certified semantic model.

Certified Dataflow — master transformation owned by the CoE. Refreshes once daily from source.
Linked Dataflow — department-level Dataflow that reads from the certified Dataflow and applies department-specific filters or calculations without re-querying the source.
Computed entities — transform Linked entity output further using Power Query in a child Dataflow. Enables 3-tier architectures (Bronze → Silver → Gold).

Dataflows governance framework

Ungoverned Dataflows create the same problems as ungoverned Power BI reports — duplication, inconsistency, and orphaned transformations with no owner.

Every Dataflow must have a named owner and a documented refresh schedule.
Use workspace separation — Certified Dataflows in a governed workspace, experimental Dataflows in a dev workspace.
Enable endorsement — mark production Dataflows as "Certified" in the Power BI service.
Quarterly audit — identify Dataflows with no downstream dependents and decommission them.
Connect to Microsoft Purview — Dataflow lineage surfaces in the Purview Data Catalog for regulated environments.

Dataflows vs. Azure Data Factory

Both tools move and transform data. The choice depends on complexity and persona.

Power BI Dataflows — designed for Power BI-centric transformations. Power Query UI. No code required. Best for analysts who own their data preparation.
Azure Data Factory (ADF) — designed for enterprise-scale ETL with complex orchestration, error handling, and multi-system dependencies. Best for data engineering teams building production pipelines.
Fabric Data Factory — the 2026 replacement for ADF in Fabric environments. Same capabilities, unified with OneLake governance.

Many enterprise deployments use both: ADF for source-to-landing-zone pipelines, and Dataflows Gen2 for the landing-zone-to-semantic-model transformation layer.

Frequently asked questions

What is the difference between Power BI Dataflows and datasets?

Dataflows prepare and store data in a shared data store, like a CDM folder or OneLake. Datasets are semantic models that use this prepared data to build the analytical layer. This layer includes:

Data transformation
Data integration
Data visualization

Data integration
Data transformation
Data visualization

Measures
Relationships
RLS

Dataflows: The ETL layer.
Datasets: The analytical layer.

Do I need Dataflows if I already have Azure Data Factory?

Many enterprises use ADF to:

Land data in Azure SQL
Land data in OneLake

They then use Dataflows for the Power Query transformations that are consumed by semantic models.

What is Dataflows Gen2 and do I need Fabric to use it?

Dataflows Gen2 is part of Microsoft Fabric. It requires a Fabric capacity of F2 or higher.

Gen1 Dataflows still function in Power BI Premium workspaces without Fabric.

Consider moving to Gen2 if you need to output data directly to a Fabric lakehouse or warehouse.

How do I handle schema changes in source systems?

When connecting to dynamic schemas, set data type detection to "Do not detect column types." This helps prevent issues with schema changes.

Schedule a Dataflows architecture review

EPC Group has successfully completed over 150 enterprise Power BI Dataflow implementations. Our team is ready to assist you with:

Dataflows Gen1 vs. Gen2 migration
Incremental refresh configuration
Governance framework design

Call (888) 381-9725 or request a 30-minute discovery call.

Key Facts

Power BI Dataflows Enterprise Guide 2026

Key facts

What Power BI Dataflows do

Gen1 vs. Gen2: key differences

Incremental refresh: the enterprise performance lever

Linked entities and shared Dataflows

Dataflows governance framework

Dataflows vs. Azure Data Factory

Frequently asked questions

What is the difference between Power BI Dataflows and datasets?

Do I need Dataflows if I already have Azure Data Factory?

What is Dataflows Gen2 and do I need Fabric to use it?

How do I handle schema changes in source systems?

Schedule a Dataflows architecture review

Frequently Asked Questions

What are Power BI Dataflows and why should enterprises use them?

What is the difference between Dataflows Gen1 and Dataflows Gen2?

How does incremental refresh work in Power BI Dataflows?

What are linked entities and computed entities in Power BI Dataflows?

How do Power BI Dataflows integrate with Microsoft Fabric lakehouses?

How much do Power BI Dataflows cost and what licensing is required?

Ready to get started?

Key Facts

Power BI Dataflows Enterprise Guide 2026

Key facts

What Power BI Dataflows do

Gen1 vs. Gen2: key differences

Incremental refresh: the enterprise performance lever

Linked entities and shared Dataflows

Dataflows governance framework

Dataflows vs. Azure Data Factory

Frequently asked questions

What is the difference between Power BI Dataflows and datasets?

Do I need Dataflows if I already have Azure Data Factory?

What is Dataflows Gen2 and do I need Fabric to use it?

How do I handle schema changes in source systems?

Schedule a Dataflows architecture review

Frequently Asked Questions

What are Power BI Dataflows and why should enterprises use them?

What is the difference between Dataflows Gen1 and Dataflows Gen2?

How does incremental refresh work in Power BI Dataflows?

What are linked entities and computed entities in Power BI Dataflows?

How do Power BI Dataflows integrate with Microsoft Fabric lakehouses?

How much do Power BI Dataflows cost and what licensing is required?

Ready to get started?