Azure Data Factory: The Enterprise Guide to ETL/ELT Pipelines, Data Integration, and CI/CD
Azure Data Factory is the backbone of enterprise data integration on the Microsoft platform, orchestrating data movement and transformation across hundreds of source and destination systems. This guide covers enterprise ADF architecture, pipeline design patterns, mapping data flows, CI/CD implementation with Azure DevOps, monitoring and alerting, SSIS migration strategies, and cost optimization — based on 150+ ADF implementations delivered by EPC Group.
Azure Data Factory Enterprise Guide 2026
Azure Data Factory (ADF) is Microsoft's fully managed, serverless data integration service. It orchestrates data movement and transformation across 100+ built-in connectors. EPC Group has implemented ADF for 150+ enterprise organizations — from mid-market companies integrating a dozen sources to Fortune 500 enterprises running thousands of daily pipelines across multiple Azure regions.
Key facts
- ADF supports 100+ built-in connectors: Azure SQL, Synapse, Blob Storage, Amazon S3, Snowflake, Salesforce, SAP, Oracle, and on-premises SQL Server.
- ADF pricing — four components: pipeline orchestration at $1.00/1,000 activity runs; data movement at $0.25/DIU-hour (minimum 4 DIUs per copy activity); mapping data flows at $0.274/vCore-hour; SSIS Integration Runtime at $0.274/vCore-hour.
- Typical enterprise spend: $800–$2,000/month for 500 pipelines with 5,000 daily activity runs and 100 GB daily data movement.
- EPC Group is a Microsoft Solutions Partner with 150+ ADF implementations across healthcare, financial services, manufacturing, and government.
- EPC Group recommendation: use ELT (not ETL) for most enterprise scenarios — load raw data into OneLake or ADLS Gen2, then transform using Synapse, Databricks, or Microsoft Fabric.
Why Azure Data Factory for enterprise data integration
ADF is Microsoft's strategic replacement for on-premises SSIS. It is the recommended data integration platform for organizations running workloads on Azure.
Key enterprise advantages
- Serverless and fully managed — no infrastructure to provision, patch, or scale. ADF allocates compute resources per pipeline run and charges only for consumption.
- 100+ native connectors — Azure services, cloud platforms (AWS, GCP), SaaS applications (Salesforce, Dynamics 365, SAP, ServiceNow), and on-premises databases via Self-Hosted Integration Runtime.
- Visual pipeline designer — drag-and-drop pipeline builder. Accelerates development. Makes pipelines accessible to analysts with minimal coding experience.
- Enterprise reliability — built-in retry policies, fault tolerance, dependency management, and 99.9% SLA-backed uptime. Execution is idempotent by design.
- Native Azure integration — Azure Key Vault, Azure Monitor, Microsoft Entra ID, Azure DevOps, and Microsoft Purview (data lineage).
Enterprise ADF architecture patterns
A well-designed ADF architecture separates concerns into four layers: ingestion, transformation, orchestration, and monitoring. This approach lets each layer scale independently and gives clear ownership.
Integration Runtime configuration
- Azure Integration Runtime — managed by Microsoft, auto-scaling. Used for cloud-to-cloud data movement and mapping data flows. Use managed virtual network for private endpoint connectivity.
- Self-Hosted Integration Runtime — installed on-premises or in an Azure VM to access data sources behind firewalls. Deploy at least two nodes for high availability. Use a dedicated service account with least-privilege access.
- Azure-SSIS Integration Runtime — managed SSIS environment in Azure for running existing SSIS packages unchanged. Standard_D8_v3 (8 vCPU, 32 GB RAM) as baseline for most enterprise workloads.
ETL vs. ELT: choosing the right pattern
ETL (Extract-Transform-Load) transforms data inside ADF using mapping data flows before loading to the destination. Transformation runs on ADF-managed Spark clusters. Use this when you need to cleanse or enrich data before it reaches the data warehouse.
ELT (Extract-Load-Transform) uses ADF to extract and load raw data into the destination first, then transforms using the destination's compute engine. EPC Group recommends ELT for most enterprise scenarios. Load raw data into OneLake or ADLS Gen2, then transform using Synapse SQL pools, Databricks, or Microsoft Fabric.
Pipeline design best practices
Naming conventions
- Pipelines: PL_Ingest_Salesforce_Accounts, PL_Transform_Silver_Customer
- Datasets: DS_ADLS_Bronze_Salesforce_Accounts_Parquet, DS_AzSQL_Gold_DimCustomer
- Linked Services: LS_AzSQL_DW_Production, LS_ADLS_DataLake, LS_KeyVault_Production
- Triggers: TR_Schedule_Daily_0600UTC, TR_Event_BlobCreated_Raw
Pipeline composition patterns
- Master-child pattern — master orchestration pipeline calls child pipelines via Execute Pipeline activity. Child pipelines perform single responsibilities. This enables independent testing, reuse, and parallel execution.
- Metadata-driven ingestion — a single parameterized pipeline reads metadata from a control table (source system, schema, table name, watermark column, destination path). A ForEach activity iterates over the metadata. This reduces hundreds of pipelines to one.
- Incremental load with watermarks — for large tables, use high-watermark patterns to load only changed data. Store the last successful watermark in a control table. Update it only after successful load to maintain idempotency.
CI/CD for ADF pipelines
Production ADF environments must use CI/CD pipelines for all changes. Manual publishing in production bypasses code review, has no rollback capability, and creates inconsistencies between environments.
Recommended CI/CD workflow using Azure DevOps or GitHub Actions:
- Connect ADF to a Git repository. Each developer works in feature branches.
- Develop and test pipelines in the ADF Studio UI connected to a development ADF instance.
- Merge changes to the collaboration branch (main) via pull requests.
- ADF automatically generates ARM templates or Bicep files from the publish branch (adf_publish).
- An Azure DevOps or GitHub Actions pipeline deploys ARM/Bicep templates to staging and production with environment-specific parameters.
The golden rule: every value that differs between environments must be a parameter. Never hard-code environment-specific values in pipeline definitions.
Security and compliance
Network security
- Enable Managed Virtual Network on Azure Integration Runtime — all data flows execute within a Microsoft-managed VNet with no public internet exposure.
- Use private endpoints for all data stores (Azure SQL, ADLS Gen2, Key Vault, Synapse).
- Deploy Self-Hosted Integration Runtime in a dedicated subnet with NSG rules restricting outbound to required ADF service endpoints only.
Secrets management
- Store all connection strings, passwords, and API keys in Azure Key Vault — never in linked service definitions.
- Use managed identity authentication for Azure-to-Azure connections (no credentials to manage).
- Rotate credentials automatically using Key Vault rotation policies.
Data protection
- Enable encryption at rest with customer-managed keys (CMK) for HIPAA and FedRAMP compliance.
- Enable encryption in transit (TLS 1.2+) — ADF enforces this by default.
- Use Microsoft Purview to scan ADF pipelines and provide end-to-end data lineage from source to consumption.
Monitoring and observability
ADF Monitor hub shows pipeline runs, activity runs, trigger runs, and integration runtime status with up to 45 days of history. For enterprise monitoring, integrate ADF with Azure Monitor.
- Tier 1 — ADF Monitor Hub: real-time visibility into pipeline runs. Use for ad-hoc troubleshooting and debugging.
- Tier 2 — Azure Monitor: send diagnostic logs to Log Analytics for custom KQL queries, cross-pipeline analytics, and alerting (pipeline failures, duration anomalies, high DIU consumption, IR errors).
- Tier 3 — Data quality monitoring: implement row count checks (source vs. destination), null value counts, referential integrity validations, and business rule assertions. Write validation results to a quality metrics table.
Migrating from SSIS to ADF
ADF is Microsoft's strategic replacement for SSIS. EPC Group has run SSIS-to-ADF migrations for 60+ enterprise clients. Two paths exist.
- Path 1: Lift-and-shift with SSIS Integration Runtime — run existing SSIS packages unchanged on an ADF-managed SSIS runtime. Fastest migration path. Does not modernize the packages. Minimum IR cost ~$275/month. Timeline: 2–4 weeks for environment setup, 1–2 weeks for package deployment and testing.
- Path 2: Re-architect to native ADF pipelines — redesign SSIS packages as ADF pipelines with mapping data flows. Full cloud-native benefits: serverless scaling, consumption pricing, built-in monitoring. Requires development effort. Timeline: 8–16 weeks depending on package count and complexity.
EPC Group recommendation: lift-and-shift critical packages first to get off on-premises infrastructure. Then re-architect high-value packages to native ADF pipelines based on ROI analysis.
Cost optimization
- Right-size DIUs — ADF defaults to auto-DIU allocation (up to 256 DIUs per copy). For most enterprise scenarios, 16–32 DIUs provide optimal throughput without overspending.
- Use TTL for data flow clusters — set 10–15 minute TTL to keep Spark clusters warm during batch windows. Eliminates 3–5 minute cold-start penalties.
- Batch windows — group pipeline executions into concentrated batch windows rather than spreading through the day. Maximizes TTL effectiveness.
- ELT over ETL — offload transformations to existing Synapse or Fabric compute rather than running ADF Spark clusters. This makes incremental transformation cost effectively $0.
- Incremental loads — watermark-based incremental loads move 1–5% of the data compared to full loads, reducing copy activity duration, DIU consumption, and data flow processing time proportionally.
Frequently asked questions
What is Azure Data Factory and what is it used for?
ADF is Microsoft's fully managed, serverless data integration service. It orchestrates and automates data movement and transformation across 100+ connectors — including Azure SQL, Synapse, Blob Storage, Amazon S3, Snowflake, Salesforce, SAP, Oracle, and on-premises SQL Server. ADF is the successor to on-premises SSIS and the recommended data integration platform for Azure environments.
How much does Azure Data Factory cost?
ADF pricing has four components: pipeline orchestration at $1.00 per 1,000 activity runs; data movement at $0.25 per DIU-hour (minimum 4 DIUs per copy activity); mapping data flows at $0.274/vCore-hour; SSIS Integration Runtime at $0.274/vCore-hour. For a typical enterprise running 500 pipelines with 5,000 daily activity runs and 100 GB of daily data movement, expect $800–$2,000/month.
What is the difference between ETL and ELT in Azure Data Factory?
ETL transforms data inside ADF using mapping data flows before loading to the destination — ADF manages the Spark clusters. ELT loads raw data into the destination first and transforms it using the destination's compute engine (Synapse, Databricks, or Fabric). EPC Group recommends ELT for most enterprise scenarios. It's more cost-effective and keeps transformation logic close to the consumption layer.
How do I implement CI/CD for Azure Data Factory pipelines?
Connect ADF to Azure DevOps Repos or GitHub. Developers work in feature branches. Changes merge to main via pull requests. ADF generates ARM/Bicep templates from the publish branch. An Azure DevOps or GitHub Actions pipeline deploys those templates to staging and production with environment-specific parameters. Never manually publish to production — all changes flow through the CI/CD pipeline.
Can Azure Data Factory replace SSIS?
Yes. ADF is Microsoft's strategic replacement for SSIS. Two migration paths: lift-and- shift with SSIS Integration Runtime (fastest, 2–6 weeks, but not cloud-native) or re-architect to native ADF pipelines (8–16 weeks, full cloud-native benefits). EPC Group recommends a phased approach: lift-and-shift first, then re-architect high-value packages.
Schedule a consultation
EPC Group has completed 10,000+ implementations across Azure, Power BI, Microsoft Fabric, SharePoint, and Copilot. Talk to an Azure architect about your ADF implementation or SSIS migration. Call (888) 381-9725 or request a discovery call.
Frequently Asked Questions
What is Azure Data Factory and what is it used for?
Azure Data Factory (ADF) is Microsoft's fully managed, serverless data integration service in Azure. It orchestrates and automates the movement and transformation of data at scale across 100+ built-in connectors — including Azure SQL Database, Azure Synapse Analytics, Azure Blob Storage, Amazon S3, Snowflake, Salesforce, SAP, Oracle, and on-premises SQL Server. ADF supports both ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) patterns. Enterprises use ADF to build data pipelines that ingest raw data from source systems, transform it using mapping data flows or external compute (Databricks, HDInsight, Synapse), and load it into data warehouses, data lakes, or analytical stores. ADF is the successor to on-premises SSIS (SQL Server Integration Services) and is the recommended data integration platform for organizations running workloads on Azure.
How much does Azure Data Factory cost?
ADF pricing is consumption-based with four billing components: (1) Pipeline orchestration and execution at $1.00 per 1,000 activity runs, (2) Data movement at $0.25 per DIU-hour (Data Integration Unit), with a minimum of 4 DIUs per copy activity, (3) Mapping data flow execution at $0.274/vCore-hour for compute-optimized clusters, and (4) SSIS Integration Runtime at $0.274/vCore-hour. For a typical enterprise running 500 pipelines with 5,000 daily activity runs and 100 GB of daily data movement, expect $800-$2,000/month. Mapping data flows are the most expensive component — optimize by right-sizing cluster configurations and using time-to-live (TTL) settings to keep clusters warm during batch windows rather than cold-starting for each execution.
What is the difference between ETL and ELT in Azure Data Factory?
ETL (Extract-Transform-Load) transforms data within ADF using mapping data flows before loading it into the destination. The transformation runs on ADF-managed Spark clusters. This pattern is ideal when you need to cleanse, reshape, or enrich data before it reaches the data warehouse. ELT (Extract-Load-Transform) uses ADF to extract and load raw data into the destination (typically Azure Synapse, Databricks, or a data lake), then transforms it using the destination's compute engine. ELT leverages the destination's processing power, which is often more cost-effective for large-scale transformations. EPC Group recommends ELT for most enterprise scenarios — load raw data into OneLake or Azure Data Lake Storage Gen2, then transform using Synapse SQL pools, Databricks, or Microsoft Fabric for better performance and lower cost than running transformations on ADF's Spark clusters.
How do I implement CI/CD for Azure Data Factory pipelines?
ADF supports Git integration with Azure DevOps Repos or GitHub. The recommended CI/CD workflow is: (1) Connect ADF to a Git repository (each developer works in feature branches), (2) Develop and test pipelines in the ADF Studio UI connected to a development ADF instance, (3) Merge changes to the collaboration branch (typically main) via pull requests, (4) ADF automatically generates ARM templates or Bicep files from the publish branch (adf_publish), (5) An Azure DevOps or GitHub Actions pipeline deploys the ARM/Bicep templates to staging and production ADF instances with environment-specific parameters (linked services, connection strings, key vault references). Use parameterized linked services and global parameters to manage environment differences. Never manually publish changes in production — all changes flow through the CI/CD pipeline.
Can Azure Data Factory replace SSIS (SQL Server Integration Services)?
Yes. ADF is Microsoft's strategic replacement for SSIS. Organizations can migrate SSIS packages to ADF using two approaches: (1) Lift-and-shift with SSIS Integration Runtime — run existing SSIS packages unchanged on an ADF-managed SSIS runtime in Azure. This is the fastest migration path but does not modernize the packages. (2) Re-architect to native ADF pipelines — redesign SSIS packages as ADF pipelines with mapping data flows. This provides full cloud-native benefits (serverless scaling, consumption pricing, built-in monitoring) but requires development effort. EPC Group recommends a phased approach: lift-and-shift critical packages first to migrate off on-premises infrastructure, then gradually re-architect high-value packages to native ADF pipelines based on ROI analysis.
How do I monitor and troubleshoot Azure Data Factory pipelines?
ADF provides built-in monitoring through the ADF Monitor hub, which shows pipeline runs, activity runs, trigger runs, and integration runtime status with up to 45 days of history. For enterprise monitoring, integrate ADF with Azure Monitor by sending diagnostic logs to a Log Analytics workspace — this enables custom KQL queries, alerting on pipeline failures, and long-term log retention (beyond 45 days). Set up Azure Monitor alerts for: pipeline failure (immediate notification), long-running pipelines (exceeding expected duration by 2x), high DIU consumption (cost anomaly), and integration runtime errors. EPC Group configures a centralized monitoring dashboard in Azure Monitor Workbooks that tracks pipeline SLAs, data freshness metrics, and cost trends across all ADF instances in the environment.
