
Enterprise guide to data transformation, M language, custom functions, query folding, parameters, and best practices for scalable Power BI data pipelines.
Quick Answer: Power Query is the data transformation engine built into Power BI that connects to 150+ data sources, cleans and reshapes data, and loads it into your data model. It uses the M language to create repeatable, auditable data pipelines that re-execute on every refresh. Think of Power Query as your ETL (Extract, Transform, Load) layer — it handles everything between your raw data sources and your clean, analysis-ready data model.
Every Power BI report starts with data, and every enterprise dataset requires transformation. Raw data from ERP systems, databases, APIs, and spreadsheets is never in the shape you need for analysis. Column names are inconsistent, data types are wrong, tables need joining, duplicates need removing, and business logic needs applying. Power Query solves all of this.
Power Query is not just a feature of Power BI — it is the foundation of every enterprise data pipeline. In our 25+ years of Power BI consulting, EPC Group has found that 70% of project time goes into data preparation. Organizations that master Power Query cut that time in half and build data models that are dramatically easier to maintain.
The Power Query engine is shared across Power BI Desktop, Power BI Service (Dataflows), Excel, Azure Data Factory, and Microsoft Fabric. Skills you build in Power Query transfer across the entire Microsoft data stack — making it one of the highest-ROI skills for any data professional.
Access the Power Query Editor in Power BI Desktop via Home > Transform Data. Understanding its five key areas accelerates enterprise development.
Lists all queries organized in groups. Use folders to separate staging queries, dimension tables, fact tables, and function libraries. Right-click to enable/disable loading, create references, and manage dependencies.
Pro Tip: Create groups: "1-Staging", "2-Dimensions", "3-Facts", "4-Functions" for clean organization.
Shows a preview of the current query results. Column headers display data types (ABC for text, 123 for number, calendar for date). Green bars under headers indicate data quality — gaps mean nulls or errors.
Pro Tip: Column profiling (View > Column Quality/Distribution/Profile) reveals data quality issues immediately.
Every transformation is recorded as a named step. Steps execute sequentially — each one transforms the output of the previous step. You can click any step to see intermediate results, insert steps, or delete steps.
Pro Tip: Rename every step descriptively. "Filtered to active customers" beats "Filtered Rows" when debugging.
Displays the M code for the currently selected step. Edit M directly here for simple changes. For complex edits, use the Advanced Editor (Home > Advanced Editor) which shows the full query as a let/in expression.
Pro Tip: Toggle the formula bar via View > Formula Bar. Always keep it visible during development.
Contains transformation commands organized by tab: Home (common operations), Transform (column-level changes), Add Column (new calculated columns), View (display settings). Most GUI actions generate M code automatically.
Pro Tip: Right-click column headers for the most common transformations — faster than navigating the ribbon.
Shows the query name, description, and "All Properties" including whether the query loads to the data model. Set "Enable load" to false for staging queries that other queries reference but that should not appear in the model.
Pro Tip: Add descriptions to every query explaining its purpose, source, and refresh behavior.
| Operation | M Code | Use Case | Folds? |
|---|---|---|---|
| Remove Duplicates | Table.Distinct(source, {"Column1"}) | Deduplication of customer records, transaction IDs | Yes (SQL DISTINCT) |
| Replace Values | Table.ReplaceValue(source, "old", "new", Replacer.ReplaceText, {"Column1"}) | Standardize status codes, fix known data quality issues | Yes (SQL REPLACE) |
| Remove Errors | Table.RemoveRowsWithErrors(source, {"Column1"}) | Clean failed type conversions, null lookups | No |
| Trim & Clean | Table.TransformColumns(source, {{"Name", Text.Trim}}) | Remove whitespace from imported text fields | Partial (LTRIM/RTRIM) |
| Operation | M Code | Use Case | Folds? |
|---|---|---|---|
| Unpivot Columns | Table.UnpivotOtherColumns(source, {"ID"}, "Attribute", "Value") | Convert crosstab/matrix layouts to normalized rows | No |
| Pivot Column | Table.Pivot(source, List.Distinct(source[Attribute]), "Attribute", "Value") | Convert row data to column headers (denormalize) | No |
| Group By | Table.Group(source, {"Category"}, {{"Total", each List.Sum([Amount])}}) | Pre-aggregate data to reduce model size | Yes (SQL GROUP BY) |
| Split Column | Table.SplitColumn(source, "FullName", Splitter.SplitTextByDelimiter(" ")) | Parse combined fields (name, address, codes) | No |
| Operation | M Code | Use Case | Folds? |
|---|---|---|---|
| Merge Queries (Join) | Table.NestedJoin(source, {"Key"}, other, {"Key"}, "Joined", JoinKind.LeftOuter) | Lookup values from reference tables, enrich records | Yes (SQL JOIN) |
| Append Queries (Union) | Table.Combine({table1, table2, table3}) | Stack data from multiple sources with same schema | Yes (SQL UNION ALL) |
| Expand Nested Tables | Table.ExpandTableColumn(source, "Joined", {"Column1", "Column2"}) | Flatten results from merge operations, JSON arrays | Yes (with foldable merge) |
| Combine Files from Folder | Folder.Files("path") then invoke custom function | Process hundreds of CSVs/Excel files automatically | N/A (file system) |
| Operation | M Code | Use Case | Folds? |
|---|---|---|---|
| Change Type | Table.TransformColumnTypes(source, {{"Date", type date}, {"Amount", type number}}) | Enforce data types for model correctness | Yes (SQL CAST) |
| Add Custom Column | Table.AddColumn(source, "NewCol", each [Price] * [Qty], type number) | Calculated fields, business logic, derived values | Sometimes (depends on expression) |
| Rename Columns | Table.RenameColumns(source, {{"OldName", "NewName"}}) | Standardize column names across sources | Yes (SQL AS alias) |
| Reorder Columns | Table.ReorderColumns(source, {"Col1", "Col2", "Col3"}) | Organize columns for readability and consistency | Yes (SQL SELECT order) |
Every Power Query transformation generates M code. For enterprise use, understanding and writing M directly is essential — GUI-only transformations hit limitations quickly when you need custom functions, API pagination, dynamic logic, or advanced error handling.
Custom functions are the most powerful enterprise feature in Power Query. They enable reuse, consistency, and advanced patterns that the GUI cannot achieve.
// Custom function: Standardize text columns
// Usage: invoke on any text column for consistent cleaning
(inputText as text) as text =>
let
Trimmed = Text.Trim(inputText),
Cleaned = Text.Clean(Trimmed),
ProperCase = Text.Proper(Cleaned),
Result = Text.Replace(ProperCase, " ", " ")
in
ResultThis function trims whitespace, removes non-printable characters, applies proper casing, and collapses double spaces. Invoke it across any text column for standardized data quality.
Most REST APIs return paginated results. This M pattern handles pagination automatically, critical for enterprise integrations.
// Recursive pagination function
let
GetPage = (url as text) as record =>
let
Response = Json.Document(Web.Contents(url)),
Data = Response[results],
NextUrl = try Response[next] otherwise null,
Result = [Data = Data, Next = NextUrl]
in
Result,
GetAllPages = (url as text) as list =>
let
FirstPage = GetPage(url),
AllData = List.Generate(
() => FirstPage,
each [Next] <> null,
each GetPage([Next]),
each [Data]
),
Combined = List.Combine(AllData)
in
Combined
in
GetAllPagesList.Generate recursively fetches pages until the "next" URL is null. EPC Group uses this pattern for CRM, EHR, and SaaS API integrations where datasets span thousands of pages.
Query folding is the single most important concept for Power Query performance at scale. When transformations fold, the source database does the heavy lifting. When they do not fold, Power Query loads raw data into memory and processes it locally — which is catastrophically slow for large datasets.
EPC Group Rule: Structure every query so that all foldable operations (filter, select, rename, join, group) execute first, followed by non-foldable operations (custom columns, pivoting, text transformations) at the end. This maximizes the work done by the database and minimizes the data loaded into Power Query memory. For a 100M row table where you need 1M rows after filtering, this is the difference between a 10-second refresh and a 30-minute refresh.
One of the most common enterprise mistakes is using DAX for work that belongs in Power Query, or vice versa. Here is the definitive decision framework.
| Task | Use Power Query (M) | Use DAX | Why |
|---|---|---|---|
| Data type conversion | — | Power Query handles type casting during load, reducing model processing | |
| Text cleanup (trim, proper case) | — | Cleanse once at load time, not on every query execution | |
| Joining/merging tables | — | Power Query merges fold to SQL JOINs; DAX LOOKUPVALUE is row-by-row | |
| Removing duplicates | — | Deduplication should happen before data enters the model | |
| Date table creation | — | Power Query date tables are more flexible and load faster than CALENDAR() | |
| Running totals / YTD / MTD | — | Time intelligence requires filter context that only DAX provides | |
| Dynamic calculations | — | Measures respond to slicer selections; Power Query is static after refresh | |
| Percentage of total | — | Requires CALCULATE to modify filter context for denominator | |
| Conditional aggregation | — | CALCULATE + FILTER enables context-dependent aggregation | |
| Row-level security logic | — | RLS is defined in DAX and evaluated at query time per user |
For a deeper dive into data model architecture, see our Power BI Data Modeling Best Practices Guide.
Power Query connects to 150+ data sources natively. Enterprise environments typically use 5-15 sources simultaneously. Understanding connector capabilities — especially which support query folding — is critical for architecture decisions.
SQL Server, Azure SQL, PostgreSQL, Oracle, MySQL, Snowflake, Amazon Redshift, Google BigQuery
Folding: Full query folding support
Best performance. EPC Group default recommendation for enterprise data warehouses.
Azure Blob, ADLS Gen2, SharePoint, OneDrive, S3, local files (CSV, Excel, JSON, Parquet)
Folding: No query folding (file-based)
Use Parquet format over CSV for 5-10x faster loading. Filter files by folder path and date.
REST APIs (Web.Contents), OData feeds, Web pages, GraphQL (via custom M)
Folding: OData supports partial folding
Requires custom M for pagination, authentication, and error handling. Rate limiting is critical.
Dataverse, Dynamics 365, Salesforce, SAP HANA, SAP BW, ServiceNow, Jira
Folding: Varies by connector
Use application-specific connectors over generic ODBC for better folding and schema support.
Lakehouse, Warehouse, KQL Database, Dataflow Gen2, Semantic Model
Folding: Full folding (Lakehouse/Warehouse)
Fabric connectors are optimized for DirectLake mode. EPC Group recommends Fabric as the primary data platform for new deployments.
Power Query SDK custom connectors for proprietary systems
Folding: Depends on implementation
EPC Group has built custom connectors for EHR systems (Epic, Cerner), trading platforms, and government databases.
In production environments, data pipelines fail. APIs return 500 errors, schemas drift, source systems change column names, and type conversions hit unexpected values. Enterprise Power Query pipelines must handle every failure gracefully — never silently dropping data or halting refreshes without notification.
The fundamental error handling construct in M. Wrap any expression that can fail and provide a fallback value.
// Safe type conversion with error capture
Table.AddColumn(source, "ParsedDate", each
try Date.FromText([RawDate])
otherwise null,
type nullable date
)Instead of dropping errors, capture them in a separate table for monitoring and investigation.
// Separate clean rows from error rows
let
Source = YourDataQuery,
ErrorRows = Table.SelectRowsWithErrors(Source, {"Amount"}),
CleanRows = Table.RemoveRowsWithErrors(Source, {"Amount"}),
ErrorLog = Table.AddColumn(ErrorRows, "ErrorTimestamp",
each DateTime.LocalNow(), type datetime)
in
CleanRows // Load clean data; ErrorLog feeds monitoring dashboardDetect when source systems change column names or types before they break downstream queries.
// Validate expected columns exist before transforming
let
Source = Sql.Database("server", "db"),
ExpectedColumns = {"CustomerID", "Name", "Email", "Amount"},
ActualColumns = Table.ColumnNames(Source),
Missing = List.Difference(ExpectedColumns, ActualColumns),
Result = if List.Count(Missing) > 0
then error "Schema drift detected. Missing: "
& Text.Combine(Missing, ", ")
else Source
in
ResultStructure transformations so foldable steps (filter, select, rename, join, group) come first. Check folding by right-clicking each step and looking for "View Native Query." If it disappears, the previous step broke folding. Move non-foldable steps to the end of the pipeline.
Apply row filters and column removal as the first steps after connecting to a source. Every row and column you carry through subsequent transformations costs memory and time. For a 50M row table where you need 5M rows, filtering first makes every subsequent step 10x faster.
Server names, database names, file paths, API endpoints, date ranges, environment flags — all should be parameters. This enables one-click environment switching, automated deployment pipelines, and incremental refresh. Never hardcode connection strings or paths in M code.
Build a staging layer: queries that connect to sources, apply basic cleaning, and are marked "Enable load = false." Then reference these staging queries in downstream transformations. This prevents duplicate source connections, enables reuse, and keeps the dependency tree clean.
Extract repeating logic into custom functions stored in a dedicated "Functions" group. Common examples: date parsing, text standardization, API pagination, file processing. A library of 10-20 functions can standardize hundreds of queries and reduce maintenance by 80%.
Wrap API calls and type conversions in try/otherwise blocks. Create error capture tables that log failed rows with metadata. Set up Power Automate alerts for refresh failures. Silent data quality degradation is worse than a visible error — make failures loud and traceable.
Rename every Applied Step to describe what it does (not "Changed Type" but "Enforce date and currency types"). Add M code comments (// single line or /* block */). Future developers (including yourself in 6 months) will thank you. Enterprise queries without documentation become unmaintainable.
When complex transformations cannot fold through the GUI, use Value.NativeQuery to pass raw SQL directly to the database. This gives you full SQL optimization (CTEs, window functions, temp tables) while keeping the pipeline within Power Query governance. Always parameterize SQL to prevent injection.
Parameters transform Power Query from a desktop tool into an enterprise deployment pipeline. Instead of hardcoded server names, file paths, and date ranges, parameters centralize configuration — enabling environment promotion (dev to staging to production) without editing a single line of M code.
Create a "ServerName" parameter with values for Dev, Staging, and Production SQL Servers. All queries reference the parameter instead of hardcoded server names. Promoting from dev to production is a single parameter change.
ServerName = "prod-sql-server.database.windows.net"
Power BI requires RangeStart and RangeEnd datetime parameters for incremental refresh. Power Query filters the date column using these parameters, and Power BI automatically manages the date ranges during scheduled refresh.
RangeStart = #datetime(2026, 1, 1, 0, 0, 0)
Store folder paths as parameters to switch between local development folders and production Azure Blob Storage or SharePoint paths. Combine with Folder.Files for batch file processing across environments.
DataFolder = "https://storage.blob.core.windows.net/data/"
Centralize API base URLs, version numbers, and non-sensitive configuration as parameters. Combine with Web.Contents for clean, maintainable API integration code. Never store API keys as parameters — use data source credentials instead.
ApiBaseUrl = "https://api.example.com/v2"
Enterprise Power BI implementation, optimization, and managed services from EPC Group.
Read moreStar schema design, relationship optimization, and data model architecture for enterprise Power BI.
Read moreDAX optimization, incremental refresh, composite models, and Premium capacity tuning.
Read morePower Query is the data transformation and preparation engine built into Power BI (and Excel). It connects to 150+ data sources, lets you clean, reshape, merge, and enrich data — all before it reaches your data model. You access it via the Power Query Editor in Power BI Desktop (Home > Transform Data). Every transformation you apply is recorded as an M language step, creating a repeatable, auditable pipeline that re-executes on every dataset refresh. EPC Group uses Power Query as the foundation of every enterprise Power BI deployment because it separates data preparation from data modeling, which is critical for maintainability at scale.
M (officially "Power Query Formula Language") is the functional programming language behind every Power Query transformation. When you click buttons in the Power Query Editor, it generates M code behind the scenes. For enterprise use, writing M directly is essential — it enables custom functions, parameterized queries, conditional logic, API pagination, and advanced error handling that the GUI cannot achieve. M is a case-sensitive, expression-based language where each query is a series of let/in steps. EPC Group engineers write custom M code for 80%+ of enterprise data pipelines because GUI-only transformations hit limitations quickly in complex environments.
Query folding is when Power Query translates your transformation steps into native source queries (SQL, OData, etc.) that execute on the source database rather than in the Power Query engine. This is critical for performance: filtering 100M rows at the database level is thousands of times faster than loading all 100M rows into Power Query memory and filtering there. Not all steps fold — custom M columns, pivoting, and certain merge types break folding. You can check by right-clicking a step and selecting "View Native Query." EPC Group designs every Power Query pipeline to maximize query folding, especially for DirectQuery and incremental refresh scenarios where folding is mandatory.
Power Query parameters are named values that can be referenced across multiple queries and changed without editing M code. Enterprise uses include: 1) Environment switching (dev/staging/production database connections via a single parameter change), 2) Incremental refresh (RangeStart/RangeEnd date parameters are required by Power BI), 3) Dynamic file paths (point to different folders per environment), 4) API keys and endpoints (centralized configuration), 5) Threshold values for business logic. EPC Group deploys parameterized Power Query pipelines for every enterprise client because they enable deployment automation and eliminate manual editing during environment promotion.
Power Query and DAX serve different purposes in the Power BI stack. Power Query (M language) runs BEFORE data reaches the model — it extracts, transforms, and loads data. DAX runs AFTER data is in the model — it creates calculated measures, columns, and tables for analysis. Think of Power Query as your ETL layer and DAX as your analytics layer. A common anti-pattern is using DAX calculated columns for transformations that should be in Power Query (e.g., text cleanup, date parsing, data type conversion). EPC Group rule: if a transformation can be done in Power Query, it should be — this keeps the data model clean and leverages query folding for performance.
Enterprise error handling in Power Query requires: 1) try/otherwise expressions around every step that can fail (API calls, type conversions, lookups), 2) Error tables that capture failed rows with metadata (source row, error message, timestamp) instead of silently dropping them, 3) Row-level error replacement using Table.ReplaceErrorValues for non-critical fields, 4) Refresh failure alerting via Power Automate or data-driven alerts, 5) Data quality checks as explicit steps (row count validation, null percentage thresholds, schema drift detection). EPC Group builds error monitoring dashboards for every enterprise Power Query pipeline — silent failures are the biggest risk in automated data refresh.
Power Query supports 150+ data source connectors natively in Power BI, including: SQL Server, Azure SQL, PostgreSQL, Oracle, MySQL (relational databases), Azure Blob Storage, SharePoint, OneDrive, ADLS Gen2 (file/cloud storage), REST APIs, OData feeds, Web pages (API/web), Dataverse, Dynamics 365, Salesforce (business applications), Excel, CSV, JSON, XML, Parquet (file formats), and Fabric Lakehouse, Warehouse, and Dataflow Gen2 (Microsoft Fabric). You can also build custom connectors using the Power Query SDK for proprietary systems. EPC Group has built custom connectors for healthcare EHR systems, financial trading platforms, and government databases that lack native support.
Power Query custom functions are reusable M code blocks that accept parameters and return results — essentially stored procedures for your ETL layer. Enterprise use cases: 1) Processing multiple files with identical structure (invoke function for each file in a folder), 2) API pagination (function that takes a page number, returns results), 3) Standardized data cleansing (trim, proper case, remove special characters applied consistently), 4) Complex business logic (fiscal year calculation, custom date hierarchies), 5) Reusable merge patterns (standard lookup function used across queries). EPC Group creates function libraries for enterprise clients — typically 10-20 reusable functions that standardize transformations across hundreds of queries.
Power Query performance optimization for enterprise datasets: 1) Maximize query folding — every foldable step runs on the source database, not in memory, 2) Filter early — apply WHERE clauses in the first steps, not after joins, 3) Remove unnecessary columns immediately — do not carry extra columns through transformations, 4) Avoid Table.Buffer unless specifically needed for many-to-many merges, 5) Use Table.Partition for parallel processing of large tables, 6) Disable "include in report refresh" for staging queries, 7) Use native SQL queries for complex transformations that cannot fold, 8) Set query evaluation timeout appropriately for long-running source queries. EPC Group has optimized Power Query pipelines processing 500M+ rows, reducing refresh times from hours to minutes through systematic folding and staging optimization.
For enterprise environments, EPC Group recommends a tiered approach: Dataflow Gen2 in Microsoft Fabric for shared, governed, reusable data preparation that feeds multiple reports and datasets. Embedded Power Query in Power BI Desktop for report-specific transformations that are not shared. The decision framework: if 2+ reports need the same transformed data, use a Dataflow. If the transformation is unique to one report, embed it. Dataflows provide centralized governance, reuse, incremental refresh at the query level, and separation of ETL from reporting — which is essential for enterprise data mesh architectures. EPC Group deploys Dataflow Gen2 as the standard ETL layer for Fabric-enabled enterprises.
EPC Group designs and builds enterprise Power Query pipelines that scale — parameterized, error-handled, fully-folded, and documented. From single-source dashboards to 50-source data platforms processing hundreds of millions of rows.