Microsoft Graph Data Connect (MGDC) lets enterprises copy Microsoft 365 organizational data — email metadata, calendar events, Teams messages, OneDrive files, and user profiles — into Azure Data Lake Storage for large-scale analytics, ML, and business intelligence. This guide covers setup, permissions, pipeline configuration, and enterprise use cases.
Key Facts
- Graph Data Connect copies M365 data in bulk to Azure — bypassing the Microsoft Graph API's per-request throttling limits.
- Data types supported: email metadata, calendar events, Teams messages, OneDrive files, user profiles, group memberships, and SharePoint activity.
- Output lands in Azure Data Lake Storage Gen2 as JSON files — ready for Azure Data Factory, Synapse, Databricks, or Fabric pipelines.
- MGDC requires Microsoft Entra ID admin approval for each data extraction scope — a privacy control that Graph API lacks.
- EPC Group: 29 years of Microsoft consulting, including MGDC implementations for workforce analytics, collaboration pattern analysis, and compliance reporting.
Microsoft Graph Data Connect: Copy Graph Datasets into Azure Data Factory
Microsoft Graph Data Connect: Copy M365 Data into Azure Data Factory
Microsoft Graph Data Connect (MGDC) lets enterprises copy Microsoft 365 organizational data — email metadata, calendar events, Teams messages, OneDrive files, and user profiles — into Azure Data Lake Storage for large-scale analytics, ML, and business intelligence. This guide covers setup, permissions, pipeline configuration, and enterprise use cases.
Key facts
- Graph Data Connect copies M365 data in bulk to Azure — bypassing the Microsoft Graph API's per-request throttling limits.
- Data types supported: email metadata, calendar events, Teams messages, OneDrive files, user profiles, group memberships, and SharePoint activity.
- Output lands in Azure Data Lake Storage Gen2 as JSON files — ready for Azure Data Factory, Synapse, Databricks, or Fabric pipelines.
- MGDC requires Microsoft Entra ID admin approval for each data extraction scope — a privacy control that Graph API lacks.
- EPC Group: 29 years of Microsoft consulting, including MGDC implementations for workforce analytics, collaboration pattern analysis, and compliance reporting.
Why Graph Data Connect instead of the Graph API
The Microsoft Graph API is designed for real-time, per-user data access. It throttles heavily at scale. Graph Data Connect solves this differently.
- MGDC delivers M365 data in bulk — thousands of mailboxes at once, not one call per mailbox.
- MGDC output goes directly to Azure Data Lake Storage — ready for Azure Data Factory, Synapse, Databricks, or Fabric processing.
- MGDC requires tenant-admin approval for each data scope — providing an auditable consent model that the Graph API doesn't enforce.
- MGDC supports incremental extraction — only changed data since the last pipeline run is copied, reducing both cost and processing time.
Enterprise use cases
- Workforce collaboration analytics — analyze email and Teams communication patterns to understand collaboration network density, response times, and cross-team connectivity.
- Manager effectiveness measurement — measure meeting load, email response time, and 1:1 frequency for manager effectiveness programs.
- Compliance and eDiscovery — extract M365 communication data for regulatory investigations, litigation hold analysis, or insider risk research.
- Security analytics — combine M365 user activity (OneDrive access, email forwarding, Teams file sharing) with Sentinel SIEM data for insider threat detection.
- Microsoft Viva Insights source data — MGDC is the underlying data source for Viva Insights advanced analytics and Organizational Network Analysis (ONA).
Setup and configuration
Configure Graph Data Connect in five steps.
- Enable MGDC in the Microsoft 365 admin center — requires Global Admin approval. MGDC is off by default for all tenants.
- Create an Azure Data Lake Storage Gen2 account — MGDC writes extracted data to an ADLS Gen2 container as JSON files.
- Configure an Azure Data Factory pipeline — use the MGDC Copy Activity connector in ADF to select datasets, set date ranges, and schedule extraction runs.
- Request data access approval — each extraction scope (email metadata, calendar events, Teams messages) requires a separate Entra ID admin approval before data flows.
- Set up extraction schedules — configure full initial extraction, then incremental daily or weekly runs for ongoing analytics.
Security for Graph Data Connect pipelines
MGDC pipelines access sensitive organizational communication data. Apply these controls before enabling data extraction.
- Restrict ADLS Gen2 access to only the service principals and users who need the extracted data — not the broader analytics team.
- Apply Purview sensitivity labels to ADLS containers receiving M365 data to enforce downstream DLP policies.
- Use private endpoints on ADLS Gen2 to keep MGDC-extracted data off the public internet.
- Configure Azure Monitor diagnostic logs on ADLS Gen2 to audit all access to extracted M365 data.
- Scope MGDC approvals narrowly — only approve the specific data types needed for each analytics use case.
Frequently asked questions
What is Microsoft Graph Data Connect?
Microsoft Graph Data Connect (MGDC) is a service that copies Microsoft 365 organizational data in bulk to Azure Data Lake Storage Gen2.
It gives enterprises access to email metadata, calendar events, Teams messages, OneDrive activity, and user profiles at scale — without the per-request throttling limits of the Microsoft Graph API. Each extraction requires explicit tenant admin approval.
What is the difference between Graph API and Graph Data Connect?
The Graph API provides real-time, per-user data access — it throttles heavily at scale and returns one user's data per API call.
Graph Data Connect is a bulk extraction pipeline that copies entire tenant-level datasets to Azure storage in one run. MGDC is designed for analytics, ML, and compliance use cases where you need data from thousands of users at once.
What data can Graph Data Connect copy?
MGDC supports: email metadata (not message bodies by default), calendar events, Teams messages and channels, OneDrive files and activity, user profiles and properties, group memberships, SharePoint activity, and Manager/Organizational hierarchy.
The specific datasets available depend on your Microsoft 365 license tier. Enterprise (E3/E5) provides the broadest dataset access.
How long does a Graph Data Connect implementation take?
Setup and initial configuration takes 2–4 weeks. This phase includes:
- MGDC tenant enablement
- ADLS Gen2 provisioning
- Azure Data Factory pipeline build
- Admin approval workflow
- Initial extraction validation
Building downstream analytics on the extracted data, such as Synapse/Fabric pipelines and Power BI dashboards, adds an additional 4–8 weeks. The exact time depends on the complexity of the use case.
Is Graph Data Connect HIPAA compliant?
MGDC is covered by the Microsoft Azure HIPAA Business Associate Agreement. The extracted data lands in Azure Data Lake Storage Gen2, which is also BAA-covered.
EPC Group configures HIPAA-compliant MGDC pipelines: private endpoints on ADLS, Purview sensitivity labels on extracted data containers, and audit logging for all data access. PHI content in M365 communications requires additional DLP controls on extraction scopes.
Build your Graph Data Connect pipeline
EPC Group designs and implements MGDC pipelines for workforce analytics, compliance, and security use cases. Call (888) 381-9725 or schedule a discovery call.
Why Organizations Choose EPC Group
EPC Group is a Microsoft consulting firm based in Houston. We have 29 years of experience in enterprise implementation and over 10,000 successful deployments. Our expertise includes:
- Power BI
- Microsoft Fabric
- SharePoint
- Azure
- Microsoft 365
- Copilot
We serve a wide range of organizations, including Fortune 500 companies, federal agencies, and sectors such as healthcare, financial services, government, manufacturing, energy, education, retail, technology, and global enterprises.
EPC Group stands out due to our governance-first approach. Each engagement starts with a security and compliance assessment.
Our team of senior architects has practical experience in:
- HIPAA
- SOC 2
- FedRAMP
- CMMC environments
We focus on delivering results, not just hours worked.
- Fixed-fee accelerators with predictable pricing and defined deliverables
- Senior architect engagement on every project, not rotating juniors
- Compliance-native delivery for regulated industries
- End-to-end coverage from strategy through 24/7 managed services
- 11,000+ enterprise engagements refined into repeatable, risk-controlled patterns
Call (888) 381-9725 or email contact@epcgroup.net for a free assessment.
Azure Architecture: 2026 Considerations for Microsoft Graph Data Connect Copy Microsoft Graph Datasets Into Azure Data Facto
Azure Confidential Computing (DCadsv5/ECasv5 series) is the key solution for privileged data in 2026. It uses AMD SEV-SNP and Intel TDX enclaves to protect data in use. This protection complements encryption for data at rest and in transit.
This technology allows regulated workloads to operate on shared Azure infrastructure. Examples include:
- Clinical analytics with PHI
- Financial services M&A modeling
- Federal IL5 workloads
With cryptographic attestation, the host operator cannot inspect the data.
Azure ExpressRoute pricing in 2026 uses a hybrid model. It includes:
- ExpressRoute Local: $0/month metered + bandwidth for in-region Azure egress.
- ExpressRoute Standard: $300/month for 1Gbps + bandwidth for cross-region access.
- ExpressRoute Premium: +$300/month for global connectivity to all Azure regions and Microsoft 365 services.
This pricing can lead to a decision that costs typical enterprises between $20K and $200K per year.
Decision factors EPC Group evaluates
- Microsoft Defender for Cloud benchmark alignment
- Reservation + Savings Plan portfolio for predictable workloads
- Azure Policy initiative assignment for Azure Government readiness
- Confidential Computing enclave evaluation for regulated workloads
- Enterprise-scale landing zone bootstrap via Bicep/Terraform
See related EPC Group services at /services or schedule a discovery call at /contact.