EPC Group - Enterprise Microsoft AI, SharePoint, Power BI, and Azure Consulting
Clutch Top Power BI & Data Solutions Company 2026, G2 High Performer, Momentum Leader, Leader Awards
BlogContact
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌

EPC Group

Enterprise Microsoft consulting with 28+ years serving Fortune 500 companies.

(888) 381-9725
contact@epcgroup.net
4900 Woodway Drive - Suite 830
Houston, TX 77056

Follow Us

Solutions

  • All Services
  • Microsoft 365 Consulting
  • AI Governance
  • Azure AI Consulting
  • Cloud Migration
  • Microsoft Copilot
  • Data Governance
  • Microsoft Fabric
  • vCIO / vCAIO Services

Industries

  • All Industries
  • Healthcare IT
  • Financial Services
  • Government
  • Education
  • Teams vs Slack

Power BI

  • Case Studies
  • 24/7 Emergency Support
  • Dashboard Guide
  • Gateway Setup
  • Premium Features
  • Lookup Functions
  • Power Pivot vs BI
  • Treemaps Guide
  • Dataverse
  • Power BI Consulting

Company

  • About Us
  • Our History
  • Microsoft Gold Partner
  • Case Studies
  • Testimonials
  • Blog
  • Resources
  • Contact

Microsoft Teams

  • Teams Questions
  • Teams Healthcare
  • Task Management
  • PSTN Calling
  • Enable Dial Pad

Azure & SharePoint

  • Azure Databricks
  • Azure DevOps
  • Azure Synapse
  • SharePoint MySites
  • SharePoint ECM
  • SharePoint vs M-Files

Comparisons

  • M365 vs Google
  • Databricks vs Dataproc
  • Dynamics vs SAP
  • Intune vs SCCM
  • Power BI vs MicroStrategy

Legal

  • Sitemap
  • Privacy Policy
  • Terms
  • Cookies

© 2026 EPC Group. All rights reserved.

Back to Blog

Understanding Azure Data Factory Pricing for Serverless Data Integration

Errin O\'Connor
December 2025
8 min read

Azure Data Factory (ADF) is Microsoft's cloud-native ETL/ELT service for building serverless data integration pipelines at enterprise scale. With 100+ built-in connectors, a visual pipeline designer, and pay-per-use pricing, ADF enables organizations to orchestrate data movement and transformation across on-premises databases, cloud data stores, SaaS applications, and big data platforms without managing infrastructure. Understanding ADF's pricing model is essential for budgeting and cost optimization -- the serverless billing has multiple dimensions that can surprise teams unfamiliar with the model. EPC Group has architected ADF pipelines for enterprise data platforms in healthcare, finance, and manufacturing, optimizing both performance and cost.

How Azure Data Factory Pricing Works

ADF uses a consumption-based pricing model with four billing dimensions. Understanding each is critical for accurate cost forecasting:

  • Pipeline Orchestration (Activity Runs): Each activity execution within a pipeline costs approximately $1.00 per 1,000 runs (cloud) or $1.50 per 1,000 runs (on-premises via SHIR). Activities include Copy, Lookup, Get Metadata, Web, ForEach, If Condition, and custom activities. High-frequency pipelines with many activities can accumulate significant orchestration charges.
  • Data Movement (Copy Activity): Charged per DIU-hour (Data Integration Unit). One DIU represents a unit of CPU, memory, and network bandwidth. Default is 4 DIUs; auto-detected up to 256 DIUs for large data movements. Cost is approximately $0.25 per DIU-hour. Larger datasets benefit from higher DIU counts (parallel copy), but the per-DIU-hour cost remains constant.
  • Data Flow Execution (Mapping Data Flows): The most expensive component. Data Flows run on managed Spark clusters. Charged per vCore-hour at approximately $0.274 (general purpose) or $0.337 (memory optimized). Minimum cluster size is 8 vCores with a 5-minute startup time. Compute-optimized data flows are available at a lower price for transformation-heavy workloads.
  • Pipeline Execution (Inactive Pipelines): Pipelines in existence (even if not running) are billed at a nominal rate. This is minimal but worth noting for environments with hundreds of inactive pipelines.

Cost Optimization Strategies

EPC Group implements several strategies to keep ADF costs under control while maintaining performance:

  • Minimize Data Flow Usage: Mapping Data Flows are powerful but expensive due to Spark cluster costs. For simple transformations (column mapping, type conversion, filtering), use Copy Activity with column mapping or stored procedure activities instead. Reserve Data Flows for complex transformations that require joins, aggregations, window functions, or derived columns.
  • Optimize DIU Allocation: Enable auto-DIU for Copy Activities and monitor actual DIU utilization through pipeline monitoring. Over-allocated DIUs waste money; under-allocated DIUs slow data movement. Right-size based on observed throughput.
  • Use TTL on Data Flow Clusters: Configure Time-to-Live (TTL) for Data Flow clusters to avoid repeated 5-minute startup costs for frequently triggered pipelines. The cluster stays warm for the TTL duration, reducing cold-start overhead.
  • Batch Activity Runs: Reduce orchestration costs by batching operations. Instead of running one pipeline per file or record, use ForEach loops within a single pipeline execution to process batches.
  • Schedule Off-Peak: Run batch ETL pipelines during off-peak hours when possible. While ADF itself does not offer time-of-day pricing, downstream resources (Synapse, SQL Database) may benefit from reduced contention.
  • Monitor with ADF Analytics: Enable ADF Analytics in Azure Monitor to track pipeline costs by pipeline, activity, and integration runtime. Identify expensive pipelines and optimize the most impactful ones first.

Core Features for Enterprise Data Integration

ADF provides a comprehensive feature set for enterprise data engineering:

  • 100+ Connectors: Native connectors to Azure services (SQL, Synapse, Blob, Data Lake, Cosmos DB), on-premises databases (SQL Server, Oracle, SAP, Teradata), cloud platforms (AWS S3, Redshift, Snowflake, Google BigQuery), SaaS applications (Salesforce, Dynamics 365, SAP, ServiceNow), and file formats (Parquet, Avro, JSON, CSV, XML, ORC).
  • Mapping Data Flows: Visual, code-free data transformation designer with 50+ transformation types including joins, aggregations, pivot/unpivot, conditional split, derived columns, window functions, and slowly changing dimension (SCD) patterns.
  • Self-Hosted Integration Runtime (SHIR): Install on-premises to securely access databases and file systems behind your firewall. SHIR supports high availability (multi-node clusters) and automatic updates.
  • Managed VNet: Run pipelines in a managed virtual network with private endpoints for secure connectivity to Azure PaaS services. No data leaves the private network during processing.
  • CI/CD Integration: Native Git integration (Azure DevOps, GitHub) for version control and ARM template-based deployment across dev/test/prod environments.
  • Monitoring and Alerting: Pipeline run history, activity-level metrics, and integration with Azure Monitor for alerts on pipeline failures, long-running executions, and cost thresholds.

ADF vs. Alternative ETL Solutions

Understanding how ADF compares to alternatives helps organizations make informed platform decisions:

  • ADF vs. Synapse Pipelines: Synapse Pipelines is built on the same ADF engine but is embedded within Azure Synapse Analytics. If your data engineering is centered on Synapse, use Synapse Pipelines to avoid managing a separate ADF instance. If you integrate data across multiple destinations (not just Synapse), standalone ADF provides the same capabilities with broader flexibility.
  • ADF vs. Databricks: ADF is a pipeline orchestration and data movement service. Databricks is a data engineering and data science platform. They are complementary: ADF orchestrates data movement and triggers Databricks notebooks for complex transformations. Use ADF for ingestion/orchestration and Databricks for heavy compute transformations.
  • ADF vs. SSIS: ADF is the cloud-native successor to SQL Server Integration Services. For existing SSIS packages, ADF provides an SSIS Integration Runtime that runs SSIS packages in Azure without rewriting them. New development should use native ADF pipelines and Data Flows for cloud-optimized performance and management.

Why EPC Group for Azure Data Factory

Building enterprise data pipelines requires more than connecting sources to destinations. EPC Group provides:

  • Data Platform Architecture: We design the overall data platform architecture (lakehouse, data warehouse, data mesh) and determine where ADF fits alongside Synapse, Databricks, and other components.
  • Pipeline Development: Our team builds production ADF pipelines with proper error handling, retry logic, logging, parameterization, and CI/CD integration. We follow software engineering best practices for data pipeline code.
  • SSIS Migration: We migrate existing SSIS packages to ADF, either through the SSIS Integration Runtime (lift-and-shift) or by rewriting packages as native ADF pipelines (modernization). Migration assessments identify the optimal approach for each package.
  • Cost Management: We implement monitoring dashboards, cost allocation tags, and optimization recommendations that keep ADF costs predictable and aligned with budget targets.
  • Performance Tuning: We optimize Copy Activity throughput, Data Flow performance, and SHIR capacity to meet SLAs for data freshness and pipeline completion times.

Build Enterprise Data Pipelines

Contact EPC Group to design and implement Azure Data Factory pipelines that connect your data sources, automate transformations, and deliver reliable data to your analytics platforms. We optimize for both performance and cost from day one.

Schedule a ConsultationCall (888) 381-9725

Frequently Asked Questions

How much does Azure Data Factory cost per month?

Monthly costs vary widely based on pipeline complexity, data volumes, and transformation requirements. A basic ETL workload moving data between a few sources daily might cost $50-200/month. A complex enterprise data platform with dozens of pipelines, Data Flows, and high-frequency triggers can cost $2,000-10,000+/month. The primary cost drivers are Data Flow cluster hours and Copy Activity DIU-hours. EPC Group provides detailed cost estimates based on your specific data volumes, pipeline frequency, and transformation complexity before implementation begins.

Can ADF connect to on-premises databases securely?

Yes. The Self-Hosted Integration Runtime (SHIR) installs as a Windows service in your datacenter and establishes an outbound encrypted connection to ADF. No inbound firewall rules are required. SHIR supports SQL Server, Oracle, SAP HANA, MySQL, PostgreSQL, file systems, HDFS, and many other on-premises sources. For high availability, deploy a multi-node SHIR cluster. SHIR auto-updates and is monitored through the ADF portal. EPC Group configures SHIR with secure credential management using Azure Key Vault integration.

Should I use Data Flows or stored procedures for transformations?

Use stored procedures when transformations are SQL-native (running on SQL Server, Synapse, or Snowflake) and the database has sufficient compute capacity. This avoids Data Flow cluster costs. Use Data Flows when transformations span multiple sources, require visual debugging, or involve complex logic that benefits from the drag-and-drop designer. For heavy-compute transformations on large datasets, consider Databricks notebooks orchestrated by ADF. EPC Group evaluates each transformation workload and recommends the most cost-effective execution engine.

How do I implement CI/CD for ADF pipelines?

ADF natively integrates with Azure DevOps Git and GitHub. Development happens in a feature branch, pipelines are validated and tested in a dev ADF instance, then published to the collaboration branch. ARM templates are auto-generated from the publish action and deployed to test and production ADF instances through Azure DevOps release pipelines or GitHub Actions. Environment-specific parameters (connection strings, storage accounts, database names) are externalized into pipeline parameters and overridden during deployment. EPC Group sets up the full CI/CD pipeline as part of every ADF implementation.

What is the maximum data throughput for ADF?

A single Copy Activity can achieve up to 5 GBps throughput with 256 DIUs for cloud-to-cloud data movement (e.g., Blob to Data Lake). On-premises to cloud throughput depends on SHIR capacity and network bandwidth. Multiple Copy Activities can run in parallel within a pipeline for higher aggregate throughput. Data Flows scale based on the Spark cluster size (up to 256+ vCores). EPC Group conducts throughput benchmarking during implementation to validate that pipelines meet your data freshness SLAs within the allocated budget.