EPC Group - Enterprise Microsoft AI, SharePoint, Power BI, and Azure Consulting
G2 High Performer Summer 2025, Momentum Leader Spring 2025, Leader Winter 2025, Leader Spring 2026
BlogContact
Ready to transform your Microsoft environment?Get started today
(888) 381-9725Get Free Consultation
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌

EPC Group

Enterprise Microsoft consulting with 29 years serving Fortune 500 companies.

(888) 381-9725
contact@epcgroup.net
4900 Woodway Drive - Suite 830
Houston, TX 77056

Follow Us

Solutions

  • All Services
  • Microsoft 365 Consulting
  • AI Governance
  • Azure AI Consulting
  • Cloud Migration
  • Microsoft Copilot
  • Data Governance
  • Microsoft Fabric
  • vCIO / vCAIO Services
  • Large-Scale Migrations
  • SharePoint Development

Industries

  • All Industries
  • Healthcare IT
  • Financial Services
  • Government
  • Education
  • Teams vs Slack

Power BI

  • Case Studies
  • 24/7 Emergency Support
  • Dashboard Guide
  • Gateway Setup
  • Premium Features
  • Lookup Functions
  • Power Pivot vs BI
  • Treemaps Guide
  • Dataverse
  • Power BI Consulting

Company

  • About Us
  • Our History
  • Microsoft Gold Partner
  • Case Studies
  • Testimonials
  • Blog
  • Resources
  • All Guides & Articles
  • Video Library
  • Client Reviews
  • Contact

Microsoft Teams

  • Teams Questions
  • Teams Healthcare
  • Task Management
  • PSTN Calling
  • Enable Dial Pad

Azure & SharePoint

  • Azure Databricks
  • Azure DevOps
  • Azure Synapse
  • SharePoint MySites
  • SharePoint ECM
  • SharePoint vs M-Files

Comparisons

  • M365 vs Google
  • Databricks vs Dataproc
  • Dynamics vs SAP
  • Intune vs SCCM
  • Power BI vs MicroStrategy

Legal

  • Sitemap
  • Privacy Policy
  • Terms
  • Cookies

About EPC Group

EPC Group is a Microsoft consulting firm founded in 1997 (originally Enterprise Project Consulting, renamed EPC Group in 2005). 29 years of enterprise Microsoft consulting experience. Microsoft Gold Partner from 2003–2022 — the oldest Microsoft Gold Partner in North America — and currently a Microsoft Solutions Partner with six designations: Data & AI, Modern Work, Infrastructure, Security, Digital & App Innovation, and Business Applications.

Headquartered at 4900 Woodway Drive, Suite 830, Houston, TX 77056. Public clients include NASA, FBI, Federal Reserve, Pentagon, United Airlines, PepsiCo, Nike, and Northrop Grumman. 6,500+ SharePoint implementations, 1,500+ Power BI deployments, 500+ Microsoft Fabric implementations, 70+ Fortune 500 organizations served, 11,000+ enterprise engagements, 200+ Microsoft Power BI and Microsoft 365 consultants on staff.

About Errin O'Connor

Errin O'Connor is the Founder, CEO, and Chief AI Architect of EPC Group. Microsoft MVP for multiple years starting 2002–2003. 4× Microsoft Press bestselling author of Windows SharePoint Services 3.0 Inside Out (MS Press 2007), Microsoft SharePoint Foundation 2010 Inside Out (MS Press 2011), SharePoint 2013 Field Guide (Sams/Pearson 2014), and Microsoft Power BI Dashboards Step by Step (MS Press 2018).

Original SharePoint Beta Team member (Project Tahoe). Original Power BI Beta Team member (Project Crescent). FedRAMP framework contributor. Worked with U.S. CIO Vivek Kundra on the Obama administration's 25-Point Plan to reform federal IT, and with NASA CIO Chris Kemp as Lead Architect on the NASA Nebula Cloud project. Speaker at Microsoft Ignite, SharePoint Conference, KMWorld, and DATAVERSITY.

© 2026 EPC Group. All rights reserved. Microsoft, SharePoint, Power BI, Azure, Microsoft 365, Microsoft Copilot, Microsoft Fabric, and Microsoft Dynamics 365 are trademarks of the Microsoft group of companies.

‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
‌
February 26, 2026|22 min read|Azure Cloud Services

Azure Data Factory: The Enterprise Guide to ETL/ELT Pipelines, Data Integration, and CI/CD

Azure Data Factory is the backbone of enterprise data integration on the Microsoft platform, orchestrating data movement and transformation across hundreds of source and destination systems. This guide covers enterprise ADF architecture, pipeline design patterns, mapping data flows, CI/CD implementation with Azure DevOps, monitoring and alerting, SSIS migration strategies, and cost optimization — based on 150+ ADF implementations delivered by EPC Group.

Table of Contents

  • Why Azure Data Factory for Enterprise Data Integration
  • Enterprise ADF Architecture Patterns
  • Pipeline Design Best Practices
  • Mapping Data Flows for Transformation
  • ETL vs. ELT: Choosing the Right Pattern
  • CI/CD for ADF Pipelines
  • Security and Compliance
  • Monitoring, Alerting, and Observability
  • Migrating from SSIS to ADF
  • Cost Optimization Strategies
  • Partner with EPC Group

Why Azure Data Factory for Enterprise Data Integration

Enterprise data integration has evolved from batch-oriented ETL running on dedicated servers to cloud-native, serverless orchestration spanning hundreds of data sources. Azure Data Factory sits at the center of this transformation, providing a unified platform to ingest, transform, and orchestrate data across on-premises, multi-cloud, and SaaS environments — without managing any infrastructure.

At EPC Group, our Azure cloud consulting practice has implemented ADF for over 150 enterprise organizations — from mid-market companies integrating a dozen data sources to Fortune 500 enterprises orchestrating thousands of daily pipelines across multiple Azure regions. The difference between a well-architected ADF environment and a poorly designed one is the difference between a reliable data platform and a brittle system that breaks with every source change.

Key Enterprise Advantages

  • Serverless and fully managed: No infrastructure to provision, patch, or scale. ADF automatically allocates compute resources for each pipeline run and charges only for consumption. You never manage Spark clusters, virtual machines, or container orchestration.
  • 100+ native connectors: Built-in connectors for Azure services (SQL Database, Synapse, Blob Storage, Cosmos DB, Data Lake Storage), cloud platforms (AWS S3, Redshift, Google BigQuery), SaaS applications (Salesforce, Dynamics 365, SAP, ServiceNow), and on-premises databases (SQL Server, Oracle, DB2, Teradata) via Self-Hosted Integration Runtime.
  • Visual pipeline designer: ADF Studio provides a drag-and-drop pipeline designer that enables data engineers to build complex orchestration visually. This accelerates development and makes pipelines accessible to analysts with minimal coding experience.
  • Enterprise-grade reliability: Built-in retry policies, fault tolerance, dependency management, and SLA-backed uptime (99.9%). Pipeline execution is idempotent by design — rerunning a failed pipeline produces the same result as running it fresh.
  • Native Azure ecosystem integration: ADF integrates natively with Azure Key Vault (secrets management), Azure Monitor (logging and alerting), Microsoft Entra ID (authentication), Azure DevOps (CI/CD), and Microsoft Purview (data lineage and cataloging).

Enterprise ADF Architecture Patterns

A well-designed ADF architecture separates concerns into distinct layers: ingestion, transformation, orchestration, and monitoring. This modular approach enables independent scaling, clear ownership, and simpler troubleshooting. The architecture integrates with your Azure Landing Zone and aligns with the medallion architecture (Bronze/Silver/Gold) that underpins modern data platforms.

Enterprise ADF Architecture (Medallion Pattern)
┌──────────────────────────────────────────────────────────────┐
│ Data Sources                                                  │
│ ├── On-premises: SQL Server, Oracle, SAP, file shares         │
│ ├── Cloud: Azure SQL, Cosmos DB, S3, BigQuery, Snowflake      │
│ └── SaaS: Salesforce, Dynamics 365, ServiceNow, HubSpot       │
└──────────────────┬───────────────────────────────────────────┘
                   │ Self-Hosted IR (on-prem) / Azure IR (cloud)
┌──────────────────▼───────────────────────────────────────────┐
│ Azure Data Factory (Orchestration Layer)                      │
│ ├── Ingestion Pipelines (Copy Activity → Raw/Bronze)          │
│ ├── Transformation Pipelines (Data Flows or External Compute) │
│ ├── Orchestration Pipelines (Master pipeline, dependencies)   │
│ └── Triggers (Schedule, Tumbling Window, Event-based)         │
├──────────────────────────────────────────────────────────────┤
│ Data Lake (Azure Data Lake Storage Gen2 / OneLake)            │
│ ├── Bronze: Raw data (as-is from source, Parquet/Delta)       │
│ ├── Silver: Cleansed, conformed, deduplicated                 │
│ └── Gold: Business-ready aggregates and dimensions            │
├──────────────────────────────────────────────────────────────┤
│ Consumption Layer                                             │
│ ├── Azure Synapse Analytics (SQL pools, serverless)           │
│ ├── Microsoft Fabric (Lakehouse, Warehouse)                   │
│ ├── <Link href="/services/power-bi-consulting" className="text-blue-600 hover:underline">Power BI</Link> (DirectQuery, Direct Lake, Import)               │
│ └── Azure ML / Databricks (ML model training)                 │
└──────────────────────────────────────────────────────────────┘

Integration Runtime Configuration

The Integration Runtime (IR) is the compute infrastructure ADF uses to execute pipeline activities. Choosing the right IR configuration is critical for performance, security, and cost.

  • Azure Integration Runtime: Managed by Microsoft, auto-scaling, used for cloud-to-cloud data movement and mapping data flows. Choose the region closest to your data sources and destinations to minimize latency and egress costs. Enable managed virtual network for private endpoint connectivity to data stores.
  • Self-Hosted Integration Runtime: Installed on-premises or in an Azure VM to access data sources behind firewalls. Deploy at least two nodes for high availability. Use a dedicated service account with least-privilege access. Monitor node health and auto-update settings.
  • Azure-SSIS Integration Runtime: Managed SSIS environment in Azure for running existing SSIS packages unchanged. Choose the node size based on package complexity and data volume. Use Standard_D8_v3 (8 vCPU, 32 GB RAM) as the baseline for most enterprise workloads.

Pipeline Design Best Practices

Enterprise ADF environments typically contain hundreds of pipelines. Without consistent design patterns, maintenance becomes impossible. EPC Group enforces the following patterns across all client engagements.

Naming Conventions

Consistent naming enables filtering, searching, and understanding pipeline purpose at a glance. Use this format: [Layer]_[Source]_[Entity]_[Action].

  • Pipelines: PL_Ingest_Salesforce_Accounts, PL_Transform_Silver_Customer, PL_Orchestrate_DailyLoad
  • Datasets: DS_ADLS_Bronze_Salesforce_Accounts_Parquet, DS_AzSQL_Gold_DimCustomer
  • Linked Services: LS_AzSQL_DW_Production, LS_ADLS_DataLake, LS_KeyVault_Production
  • Triggers: TR_Schedule_Daily_0600UTC, TR_Event_BlobCreated_Raw

Pipeline Composition Patterns

  • Master-child pattern: A master orchestration pipeline calls child pipelines using Execute Pipeline activity. The master handles scheduling, dependency ordering, and error aggregation. Child pipelines perform single responsibilities (ingest one source, transform one entity). This enables independent testing, reuse, and parallel execution.
  • Metadata-driven ingestion: Instead of creating one pipeline per source table, build a single parameterized pipeline that reads metadata from a control table (source system, schema, table name, watermark column, destination path). A ForEach activity iterates over the metadata and executes the parameterized copy for each table. This pattern reduces the number of pipelines from hundreds to one, simplifying maintenance and onboarding new sources.
  • Incremental load with watermarks: For large tables, use high-watermark patterns to load only changed data. Store the last successful watermark (timestamp or sequence ID) in a control table. Each pipeline run queries source data where the watermark column exceeds the stored value. Update the watermark only after successful load to ensure idempotency.
  • Error handling with retry and dead-letter: Configure retry policies on activities (3 retries with exponential backoff). Use If Condition activities to branch on failure. Write failed records to a dead-letter container for manual review rather than failing the entire pipeline. Log all errors to Azure Monitor with pipeline name, activity name, and error message for centralized troubleshooting.

Avoid: Monolithic Pipelines

The most common ADF anti-pattern is building massive pipelines with 50+ activities that ingest, transform, and load in a single flow. These pipelines are impossible to test incrementally, difficult to debug (a failure at activity 47 requires rerunning all 46 preceding activities), and cannot be parallelized. Always decompose into small, single-purpose child pipelines composed by a master orchestrator.

Mapping Data Flows for Transformation

Mapping data flows are ADF's built-in transformation engine, providing a visual, code-free interface for building data transformation logic that executes on managed Apache Spark clusters. Data flows support joins, aggregations, pivots, unpivots, conditional splits, derived columns, lookups, window functions, and slowly changing dimension (SCD) patterns.

When to Use Mapping Data Flows

  • Use data flows when: You need code-free transformations that data engineers and analysts can maintain, your transformations are standard patterns (joins, aggregations, lookups, type conversions), you want built-in SCD Type 1/2 support, and your data volumes are moderate (under 500 GB per execution).
  • Use external compute when: You need custom Python/Scala logic (use Databricks or Synapse Spark), your data volumes exceed 500 GB per execution (Databricks handles very large datasets more efficiently), you already have a Databricks or Synapse Spark investment, or you need real-time streaming transformations (use Azure Stream Analytics or Spark Structured Streaming).

Data Flow Performance Optimization

  • Cluster configuration: Choose compute-optimized clusters for CPU-intensive transformations (complex joins, aggregations) and memory-optimized clusters for large dataset caching. Start with 8 cores for development and scale to 32-64 cores for production workloads.
  • Time-to-live (TTL): Set TTL to 10-15 minutes to keep clusters warm between consecutive data flow executions. Cold-starting a Spark cluster takes 3-5 minutes — TTL eliminates this overhead for batch windows with multiple sequential data flows.
  • Partitioning: Use hash partitioning on join keys and group-by columns to optimize data distribution across Spark partitions. Avoid round-robin partitioning for large joins as it causes expensive shuffles.
  • Source optimization: Push filter predicates to the source query (predicate pushdown) rather than filtering after loading all data. Use column pruning to select only required columns. For large databases, use partitioned reading with parallel source queries.

ETL vs. ELT: Choosing the Right Pattern

CharacteristicETL (Transform in ADF)ELT (Transform in Destination)
Transformation engineADF Mapping Data Flows (Spark)Synapse SQL, Databricks, Fabric
Data volume sweet spotUnder 500 GB per executionAny volume (scales with destination)
Cost modelADF Spark cluster hoursDestination compute (often already paid for)
ComplexityVisual, no-code/low-codeSQL, Python, or Scala code
Data lineageFull lineage in ADF MonitorRequires Purview or manual tracking
Best forSimple to moderate transformationsComplex, large-scale transformations

EPC Group recommends the ELT pattern for most enterprise data platforms. Use ADF as the orchestration and ingestion layer (Copy Activity to move data from source to data lake), then transform data using Synapse SQL pools, Databricks notebooks, or Microsoft Fabric lakehouses. This approach leverages the destination's optimized compute engine, avoids double-paying for Spark (ADF data flows + destination), and keeps transformation logic close to the consumption layer where data engineers and analysts can iterate faster.

CI/CD for ADF Pipelines

Production ADF environments must use CI/CD pipelines to deploy changes. Manual publishing in production is a deployment risk — it bypasses code review, has no rollback capability, and creates inconsistencies between environments. EPC Group implements CI/CD for every ADF engagement using Azure DevOps or GitHub Actions. See our Azure DevOps CI/CD guide for broader DevOps practices.

CI/CD Workflow

  1. Development: Developers work in the ADF Studio UI connected to a development ADF instance. Each developer creates feature branches from the collaboration branch (main). Changes are committed to Git automatically as developers save in the UI.
  2. Code review: Developers submit pull requests to merge feature branches into main. Reviewers validate pipeline logic, naming conventions, parameterization, and security configurations. Automated validation runs ADF's built-in validation API to catch errors before merge.
  3. Build: After merge to main, a CI pipeline validates the ADF artifacts and generates deployment templates. Use the @microsoft/azure-data-factory-utilities npm package to validate and export ARM templates programmatically, replacing the legacy adf_publish branch approach.
  4. Deploy to staging: The CD pipeline deploys ARM/Bicep templates to the staging ADF instance. Environment-specific parameters (linked service connection strings, key vault URIs, trigger schedules) are injected via parameter files. Triggers are stopped before deployment and restarted after.
  5. Integration testing: Automated tests execute key pipelines in staging with test data. Validate row counts, data quality checks, and SLA compliance. EPC Group uses custom Python scripts that invoke ADF pipelines via the REST API and validate output datasets.
  6. Deploy to production: After staging validation passes, the same templates deploy to production with production parameter files. Trigger stop/start is automated. Rollback is handled by redeploying the previous ARM template version from the artifact repository.

Parameterize Everything

The golden rule of ADF CI/CD: every value that differs between environments must be a parameter. This includes linked service connection strings, Key Vault URIs, storage account names, database server names, trigger schedules, and integration runtime references. Use ADF global parameters for values shared across pipelines and pipeline parameters for pipeline-specific values. Never hard-code environment-specific values in pipeline definitions — this is the number one cause of CI/CD failures in ADF deployments.

Security and Compliance

Enterprise ADF deployments must enforce security at every layer — network, identity, data, and secrets management. For regulated industries, EPC Group implements configurations that map to HIPAA, SOC 2, and FedRAMP controls. Our data governance practice works closely with our Azure team to ensure ADF environments meet compliance requirements.

Network Security

  • Enable Managed Virtual Network on Azure Integration Runtime — all data flows execute within a Microsoft-managed VNet with no public internet exposure
  • Use private endpoints for all data stores (Azure SQL, ADLS Gen2, Key Vault, Synapse)
  • Deploy Self-Hosted Integration Runtime in a dedicated subnet with NSG rules restricting outbound to required ADF service endpoints only
  • Disable public network access on the ADF instance — access the ADF Studio through Azure Private Link

Secrets Management

  • Store all connection strings, passwords, and API keys in Azure Key Vault — never in linked service definitions
  • Use managed identity authentication for Azure-to-Azure connections (no credentials to manage)
  • Use service principal with certificate authentication for cross-tenant or third-party connections
  • Rotate credentials automatically using Key Vault rotation policies
  • Audit Key Vault access with Azure Monitor and alert on unauthorized access attempts

Data Protection

  • Enable encryption at rest with customer-managed keys (CMK) for HIPAA and FedRAMP compliance
  • Enable encryption in transit (TLS 1.2+) for all data movement — ADF enforces this by default
  • Implement data masking in mapping data flows for PII/PHI columns before loading to downstream systems
  • Use Microsoft Purview to scan ADF pipelines and provide end-to-end data lineage from source to consumption

Monitoring, Alerting, and Observability

Enterprise ADF environments run thousands of pipeline executions daily. Without proactive monitoring, failures go undetected, data freshness degrades, and costs spike. EPC Group implements a three-tier monitoring strategy for every ADF deployment.

  • Tier 1 — ADF Monitor Hub: Real-time visibility into pipeline runs, activity runs, and trigger runs. Use for ad-hoc troubleshooting and debugging. The ADF Monitor Hub retains 45 days of run history and provides direct links to error details, input/output payloads, and activity durations. Enable "Gantt" view to visualize pipeline execution timelines and identify bottlenecks.
  • Tier 2 — Azure Monitor integration: Send ADF diagnostic logs to a Log Analytics workspace for custom KQL queries, cross-pipeline analytics, and alerting. Configure alerts for: pipeline failures (fire within 5 minutes), pipeline duration anomalies (exceeding 2x normal), high DIU consumption (cost threshold exceeded), and Self-Hosted IR node offline. Use Azure Monitor Workbooks for executive dashboards showing pipeline SLA compliance, data freshness trends, and monthly cost analysis.
  • Tier 3 — Data quality monitoring: Beyond pipeline execution monitoring, validate the data itself. Implement row count checks (source vs. destination), null value counts on required columns, referential integrity validations, and business rule assertions. Use ADF's data flow assert transformation or post-load SQL queries to implement these checks. Write validation results to a quality metrics table and alert on threshold violations.

Migrating from SSIS to ADF

Organizations running SSIS packages on-premises face end-of-mainstream-support timelines and increasing infrastructure costs. ADF provides two migration paths, and EPC Group's cloud migration team has executed SSIS-to-ADF migrations for over 60 enterprise clients.

Path 1: Lift-and-Shift (SSIS Integration Runtime)

  • How it works: Provision an Azure-SSIS Integration Runtime in ADF. Deploy existing SSIS packages to the SSISDB catalog hosted on Azure SQL Database or Azure SQL Managed Instance. Execute packages through ADF pipelines using the Execute SSIS Package activity.
  • Timeline: 2-4 weeks for environment setup, 1-2 weeks for package deployment and testing.
  • When to use: Immediate migration needed, hundreds of complex packages, limited development resources, packages use custom components or Script Tasks that cannot be replicated in data flows.
  • Limitations: Still running SSIS technology (not cloud-native), requires SQL Server licensing for SSISDB, no auto-scaling (fixed VM size), and the IR costs ~$275/month minimum (Standard_D2_v3).

Path 2: Re-architect to Native ADF

  • How it works: Redesign SSIS packages as native ADF pipelines using Copy Activity for data movement and mapping data flows for transformation. Complex transformations requiring custom code migrate to Databricks or Synapse Spark notebooks invoked from ADF.
  • Timeline: 8-16 weeks depending on the number and complexity of packages.
  • When to use: Modernization is a priority, packages are relatively straightforward (bulk data movement with standard transformations), team wants to eliminate SSIS dependency entirely.
  • Benefits: Fully serverless (no IR infrastructure to manage), consumption-based pricing, mapping data flows provide visual transformation editing, full CI/CD integration, and native Azure Monitor observability.

Cost Optimization Strategies

ADF costs can escalate quickly without optimization — mapping data flows running oversized Spark clusters and copy activities using excessive DIUs are the two most common cost drivers. EPC Group applies these optimization strategies across all ADF engagements.

  • Right-size DIUs for copy activities: ADF defaults to auto-DIU allocation, which can use up to 256 DIUs for a single copy. For most enterprise scenarios, 16-32 DIUs provide optimal throughput without overspending. Monitor actual DIU utilization in pipeline run details and set explicit DIU limits.
  • Use TTL for data flow clusters: Set 10-15 minute TTL to keep Spark clusters warm during batch windows. This eliminates 3-5 minute cold-start penalties between consecutive data flows while automatically releasing clusters after the batch window ends.
  • Schedule batch windows efficiently: Group pipeline executions into concentrated batch windows rather than spreading them throughout the day. This maximizes TTL effectiveness and allows data flow clusters to serve multiple pipelines before spinning down.
  • Use ELT over ETL where possible: Offload transformations to your existing Synapse or Fabric compute rather than running ADF Spark clusters. Since you're already paying for Synapse/Fabric capacity, this effectively makes the transformation cost $0 incremental.
  • Implement incremental loads: Full loads of large tables are the single biggest ADF cost driver. Watermark-based incremental loads move 1-5% of the data compared to full loads, reducing copy activity duration, DIU consumption, and data flow processing time proportionally.
  • Monitor and alert on cost anomalies: Use Azure Cost Management to set monthly budgets for ADF resources. Configure alerts at 50%, 75%, and 90% thresholds. Review the ADF pricing calculator monthly against actual consumption to identify optimization opportunities.

Partner with EPC Group

EPC Group is a Microsoft Gold Partner with over 150 Azure Data Factory implementations across healthcare, financial services, manufacturing, and government. Our Azure cloud consulting team delivers end-to-end ADF solutions — from architecture design and pipeline development through CI/CD automation, monitoring implementation, and ongoing optimization. Whether you're migrating from SSIS, building a new data platform, or optimizing an existing ADF environment, EPC Group brings the enterprise expertise and proven methodologies to ensure your data integration platform is reliable, scalable, and cost-effective.

Schedule ADF AssessmentAzure Cloud Services

Frequently Asked Questions

What is Azure Data Factory and what is it used for?

Azure Data Factory (ADF) is Microsoft's fully managed, serverless data integration service in Azure. It orchestrates and automates the movement and transformation of data at scale across 100+ built-in connectors — including Azure SQL Database, Azure Synapse Analytics, Azure Blob Storage, Amazon S3, Snowflake, Salesforce, SAP, Oracle, and on-premises SQL Server. ADF supports both ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) patterns. Enterprises use ADF to build data pipelines that ingest raw data from source systems, transform it using mapping data flows or external compute (Databricks, HDInsight, Synapse), and load it into data warehouses, data lakes, or analytical stores. ADF is the successor to on-premises SSIS (SQL Server Integration Services) and is the recommended data integration platform for organizations running workloads on Azure.

How much does Azure Data Factory cost?

ADF pricing is consumption-based with four billing components: (1) Pipeline orchestration and execution at $1.00 per 1,000 activity runs, (2) Data movement at $0.25 per DIU-hour (Data Integration Unit), with a minimum of 4 DIUs per copy activity, (3) Mapping data flow execution at $0.274/vCore-hour for compute-optimized clusters, and (4) SSIS Integration Runtime at $0.274/vCore-hour. For a typical enterprise running 500 pipelines with 5,000 daily activity runs and 100 GB of daily data movement, expect $800-$2,000/month. Mapping data flows are the most expensive component — optimize by right-sizing cluster configurations and using time-to-live (TTL) settings to keep clusters warm during batch windows rather than cold-starting for each execution.

What is the difference between ETL and ELT in Azure Data Factory?

ETL (Extract-Transform-Load) transforms data within ADF using mapping data flows before loading it into the destination. The transformation runs on ADF-managed Spark clusters. This pattern is ideal when you need to cleanse, reshape, or enrich data before it reaches the data warehouse. ELT (Extract-Load-Transform) uses ADF to extract and load raw data into the destination (typically Azure Synapse, Databricks, or a data lake), then transforms it using the destination's compute engine. ELT leverages the destination's processing power, which is often more cost-effective for large-scale transformations. EPC Group recommends ELT for most enterprise scenarios — load raw data into OneLake or Azure Data Lake Storage Gen2, then transform using Synapse SQL pools, Databricks, or Microsoft Fabric for better performance and lower cost than running transformations on ADF's Spark clusters.

How do I implement CI/CD for Azure Data Factory pipelines?

ADF supports Git integration with Azure DevOps Repos or GitHub. The recommended CI/CD workflow is: (1) Connect ADF to a Git repository (each developer works in feature branches), (2) Develop and test pipelines in the ADF Studio UI connected to a development ADF instance, (3) Merge changes to the collaboration branch (typically main) via pull requests, (4) ADF automatically generates ARM templates or Bicep files from the publish branch (adf_publish), (5) An Azure DevOps or GitHub Actions pipeline deploys the ARM/Bicep templates to staging and production ADF instances with environment-specific parameters (linked services, connection strings, key vault references). Use parameterized linked services and global parameters to manage environment differences. Never manually publish changes in production — all changes flow through the CI/CD pipeline.

Can Azure Data Factory replace SSIS (SQL Server Integration Services)?

Yes. ADF is Microsoft's strategic replacement for SSIS. Organizations can migrate SSIS packages to ADF using two approaches: (1) Lift-and-shift with SSIS Integration Runtime — run existing SSIS packages unchanged on an ADF-managed SSIS runtime in Azure. This is the fastest migration path but does not modernize the packages. (2) Re-architect to native ADF pipelines — redesign SSIS packages as ADF pipelines with mapping data flows. This provides full cloud-native benefits (serverless scaling, consumption pricing, built-in monitoring) but requires development effort. EPC Group recommends a phased approach: lift-and-shift critical packages first to migrate off on-premises infrastructure, then gradually re-architect high-value packages to native ADF pipelines based on ROI analysis.

How do I monitor and troubleshoot Azure Data Factory pipelines?

ADF provides built-in monitoring through the ADF Monitor hub, which shows pipeline runs, activity runs, trigger runs, and integration runtime status with up to 45 days of history. For enterprise monitoring, integrate ADF with Azure Monitor by sending diagnostic logs to a Log Analytics workspace — this enables custom KQL queries, alerting on pipeline failures, and long-term log retention (beyond 45 days). Set up Azure Monitor alerts for: pipeline failure (immediate notification), long-running pipelines (exceeding expected duration by 2x), high DIU consumption (cost anomaly), and integration runtime errors. EPC Group configures a centralized monitoring dashboard in Azure Monitor Workbooks that tracks pipeline SLAs, data freshness metrics, and cost trends across all ADF instances in the environment.

Ready to get started?

EPC Group has completed over 10,000 implementations across Power BI, Microsoft Fabric, SharePoint, Azure, Microsoft 365, and Copilot. Let's talk about your project.

contact@epcgroup.net(888) 381-9725www.epcgroup.net
Schedule a Free Consultation