The Data Governance Problem in 2026
Enterprise data sprawl has reached a critical inflection point. The average Fortune 500 organization stores sensitive data across 15+ distinct platforms: Microsoft 365, Azure SQL, AWS S3, Snowflake, on-premises file shares, legacy databases, and SaaS applications. Without centralized governance, organizations face regulatory fines (GDPR penalties exceeded $4.2 billion through 2025), data breaches from misconfigured permissions, inconsistent classification that makes legal discovery impossible, and shadow data proliferation that no one can audit.
Microsoft Purview addresses this by providing a single control plane for data governance and compliance across the entire data estate. It is not just a Microsoft 365 tool — it scans and governs data wherever it lives, including competing cloud platforms.
Microsoft Purview Architecture Overview
Microsoft Purview operates across two complementary planes that serve different but overlapping audiences:
Data Governance Plane (formerly Azure Purview) — Serves data engineers, data stewards, and chief data officers. This plane includes the Data Map (automated scanning and classification of data sources), Data Catalog (business-friendly data discovery with glossary terms), Data Estate Insights (health dashboards across the estate), and Data Lineage (tracing how data flows through ETL pipelines from source to destination). It connects to 100+ data sources through built-in and custom connectors.
Compliance Plane (formerly Microsoft 365 Compliance Center) — Serves compliance officers, legal teams, information security analysts, and HR. This plane includes Data Loss Prevention (DLP), Information Protection (sensitivity labels and encryption), Insider Risk Management, eDiscovery and Audit, Compliance Manager (regulatory assessment tracking), Communication Compliance, and Records Management. It operates primarily within the Microsoft 365 ecosystem but extends to endpoints and third-party cloud apps through Microsoft Defender for Cloud Apps.
Both planes share a unified classification engine that uses 300+ built-in sensitive information types (SSNs, credit card numbers, medical record numbers, passport numbers) plus custom trainable classifiers that organizations build from their own data samples.
Data Map and Automated Classification
The Data Map is the foundation of Purview's governance capabilities. It provides automated scanning that discovers, classifies, and catalogs data assets across your entire estate.
Supported Data Sources
| Category | Sources | Scan Method |
|---|---|---|
| Azure Native | Azure SQL, Synapse, Data Lake, Cosmos DB, Blob Storage | Managed (no agent) |
| AWS | S3, RDS, Glue, Redshift | Self-hosted runtime |
| GCP | BigQuery, Cloud Storage | Self-hosted runtime |
| On-Premises | SQL Server, Oracle, SAP HANA, file shares | Self-hosted runtime |
| SaaS | Snowflake, Teradata, Databricks, Power BI | Managed connector |
Each scan performs three operations: asset discovery (identifying tables, files, columns), classification (applying sensitive information types based on pattern matching and machine learning), and lineage extraction (mapping data flow through Data Factory, Synapse pipelines, and other ETL tools). Scans run on configurable schedules — weekly full scans with daily incremental scans is the typical enterprise pattern.
Classification Engine
Purview's classification engine combines exact data matching (EDM), which validates against actual datasets like employee SSN tables, with pattern-based matching using regex and checksum validation, keyword proximity detection (finding "patient" near a number pattern to identify medical record numbers), and trainable classifiers built from 25-50 positive samples of your organization's sensitive document types. Custom trainable classifiers are particularly valuable for industry-specific data like insurance claim numbers, internal project codes, or proprietary identifiers that no built-in classifier covers.
Data Loss Prevention (DLP)
DLP policies prevent sensitive data from leaving controlled environments through email, SharePoint sharing, Teams messages, OneDrive sync, USB devices, clipboard actions, and printing. Enterprise DLP implementation requires a strategic approach, not just enabling policies.
DLP Policy Architecture
Structure DLP policies in three tiers to balance protection with user productivity:
- Tier 1: Block with Override — High-volume, moderate-sensitivity detections. Users see a policy tip and can override with a business justification. Examples: sharing documents containing 1-9 credit card numbers, emailing files with PII to external recipients.
- Tier 2: Block Without Override — High-sensitivity detections. No user override is possible, and a compliance alert is generated. Examples: bulk SSN exfiltration (10+ records), sharing documents marked Highly Confidential externally.
- Tier 3: Endpoint DLP — Device-level controls preventing copy-to-USB, print, upload to unsanctioned cloud apps, and clipboard copy of sensitive content. Requires Microsoft Defender for Endpoint onboarding.
Always deploy DLP policies in "Test with policy tips" mode for 2-4 weeks before enforcement. Analyze false positive rates during testing. Anything above 5% false positives indicates the policy needs tuning — either narrowing the sensitive information type, adding exceptions for specific user groups, or increasing the confidence threshold from "medium" to "high."
Information Protection and Sensitivity Labels
Sensitivity labels are the enforcement mechanism for data classification. They travel with the document, controlling access regardless of where the file ends up. A document labeled "Confidential — Internal Only" remains encrypted and restricted even if accidentally shared externally.
Label Taxonomy Design
Design your label hierarchy to match your organization's data classification policy. A proven enterprise taxonomy includes:
- Public — No restrictions. Marketing materials, press releases, public website content.
- General — Internal use. No encryption. Visual marking (header/footer). Default label for most business documents.
- Confidential — Encrypted. Sublabels for "All Employees" (organization-wide access), "Specific People" (named recipients), and "Recipients Only" (forwarding disabled).
- Highly Confidential — Encrypted with restricted access. Sublabels for "Board/Executive" (board members only), "Legal Privileged" (legal team only), and "Regulatory" (compliance team only).
Enable auto-labeling policies to automatically apply sensitivity labels when documents match specific conditions — for example, automatically labeling any document containing 5+ patient health records as "Highly Confidential — Regulatory." Auto-labeling reduces the burden on end users and ensures consistent protection even when users forget to label manually.
Insider Risk Management
Insider Risk Management uses machine learning to correlate signals across user activities and identify potentially risky behavior patterns. It integrates signals from Microsoft 365 (file access, email, Teams), Microsoft Defender for Endpoint (device activity), and HR connectors (resignation dates, performance improvement plans).
Key policy templates include data theft by departing users (triggered by HR resignation signal combined with increased file downloads), data leaks (detecting anomalous sharing or external email volumes), security policy violations (visiting risky websites, disabling security tools), and patient data access violations (healthcare-specific, monitoring EHR access patterns against job responsibilities).
Privacy is built into the design. Analyst views are pseudonymized by default — investigators see "User-7291" rather than the employee's name until an escalation decision is approved by a compliance manager with appropriate RBAC permissions. This prevents casual browsing of employee activities while enabling legitimate investigations.
eDiscovery Premium
eDiscovery Premium handles legal hold, content search, review, and export workflows for litigation and regulatory investigations. The platform processes email, documents, Teams chat, Yammer, and third-party data through import connectors.
Key capabilities include custodian management (placing legal holds on specific people's mailboxes and OneDrive sites), advanced indexing (deep processing of images using OCR, extracting text from PDFs and scanned documents), review sets with machine learning (predictive coding to prioritize relevant documents, near-duplicate detection, email threading to reduce review volume by 40-60%), and analytics dashboards showing themes, key terms, and communication patterns across custodian data.
For organizations in regulated industries, eDiscovery Premium integrates with Purview Audit Premium to provide 10-year audit log retention (compared to 1 year for standard) and high-bandwidth access to the audit log search API for SIEM integration. This extended retention is critical for SEC, FINRA, and HIPAA investigations that may look back years.
Compliance Manager
Compliance Manager provides continuous assessment tracking against 350+ regulatory templates including GDPR, HIPAA, SOC 2, ISO 27001, NIST 800-53, FedRAMP, PCI DSS, and industry-specific frameworks. Each assessment scores your organization's compliance posture and provides prioritized improvement actions.
The compliance score is calculated based on improvement actions completed. Microsoft manages actions (infrastructure controls that Microsoft maintains) contribute approximately 50% of the score. Customer-managed actions (configurations and policies your organization controls) account for the remaining 50%. Each action is weighted by its control importance — encryption controls carry more weight than documentation controls.
Create a compliance dashboard that maps your regulatory obligations to Purview controls. For a healthcare organization, this means mapping HIPAA Administrative Safeguards to information protection policies, Technical Safeguards to DLP and encryption, and Physical Safeguards to endpoint DLP and device management. This mapping provides evidence for auditors that controls are not only in place but actively monitored.
Records Management
Records Management enforces retention and disposition across the content lifecycle. Retention labels define how long content must be kept and what happens when that period expires: delete automatically, trigger a disposition review, or mark as a regulatory record that cannot be deleted by anyone.
Key implementation considerations for enterprise records management include file plan management for importing existing retention schedules from Excel, event-based retention (starting the retention clock when a contract expires or an employee departs rather than from creation date), regulatory records that prevent any modification or deletion even by administrators, and multi-stage retention for documents that move through active, semi-active, and archive phases with different storage and access policies.
Auto-apply retention labels using trainable classifiers, keyword queries, or sensitive information types to eliminate manual labeling burden. For example, automatically apply a 7-year retention label to any document classified as containing financial records, or a 10-year label to documents containing patient health information.
Implementation Roadmap
A successful Purview implementation follows a phased approach that minimizes user disruption while building governance capabilities incrementally:
Phase 1: Discovery and Classification (Weeks 1-4)
- Deploy the Data Map and connect all primary data sources
- Run initial full scans across Azure, Microsoft 365, and on-premises
- Review classification results and tune custom classifiers
- Build the data glossary with business-friendly terms
- Assign data stewards for each major data domain
Phase 2: Information Protection (Weeks 5-10)
- Design and publish sensitivity label taxonomy
- Configure DLP policies in test/monitor mode
- Deploy endpoint DLP to managed devices
- Train end users on labeling expectations
- Analyze DLP alerts and tune false positive rates
Phase 3: Compliance Workflows (Weeks 11-16)
- Configure eDiscovery cases and legal hold templates
- Deploy insider risk management policies
- Implement records management retention labels
- Set up Compliance Manager assessments for target regulations
- Configure communication compliance for regulated channels
Phase 4: Optimization (Weeks 17-20)
- Enforce DLP policies (move from test to block)
- Enable auto-labeling based on Phase 2 analysis
- Build executive compliance dashboards in Power BI
- Conduct compliance officer training and tabletop exercises
- Document runbooks for common compliance workflows
Industry-Specific Compliance Mapping
| Regulation | Purview Components | Key Controls |
|---|---|---|
| HIPAA | DLP, Information Protection, eDiscovery, Audit Premium | PHI detection, encryption at rest/transit, access logging, 6-year retention |
| GDPR | Data Map, DLP, Information Protection, Records Management | Data subject rights, consent tracking, cross-border transfer controls, right to erasure |
| SOC 2 | Compliance Manager, Insider Risk, DLP, Audit | Access controls, change management, incident response, continuous monitoring |
| FedRAMP | Information Protection, DLP, Records Management, Audit Premium | FIPS 140-2 encryption, CUI handling, 10-year audit retention, incident reporting |
| PCI DSS | DLP, Information Protection, Data Map | Cardholder data detection, network segmentation validation, access monitoring |
Common Pitfalls and How to Avoid Them
Enterprise Purview implementations fail most often due to these preventable mistakes:
- Deploying DLP in enforcement mode immediately — Always run in test mode for 2-4 weeks. Aggressive initial enforcement generates helpdesk tickets, user resentment, and executive pushback that can derail the entire program.
- Creating too many sensitivity labels — Users ignore complex taxonomies. Limit to 4-5 parent labels with 2-3 sublabels each. More than 15 total labels guarantees inconsistent adoption.
- Ignoring the data steward model — Data governance fails without business ownership. Assign stewards for each data domain (HR data, financial data, customer data) who review classifications, manage glossary terms, and approve access requests.
- Skipping change management — Purview changes daily workflows. Invest in training, quick reference guides, and executive sponsorship communications before go-live.
- Not integrating with SIEM — Purview generates alerts that must flow into your security operations center. Configure the Management Activity API or use the Microsoft Sentinel connector to ensure compliance alerts are triaged alongside security events.
Licensing and Cost Planning
Microsoft Purview licensing is complex. Here is a practical breakdown:
- Microsoft 365 E3 ($36/user/month) — Basic DLP (Exchange and SharePoint only), basic audit (180-day retention), manual sensitivity labels, basic records management.
- Microsoft 365 E5 ($57/user/month) — Full DLP (endpoint, Teams, third-party apps), auto-labeling, insider risk management, eDiscovery Premium, advanced audit (1-year retention), communication compliance.
- E5 Compliance add-on ($12/user/month on top of E3) — All E5 compliance features without the E5 security and voice features. Best option if you only need compliance capabilities.
- Purview Data Governance (Azure consumption) — Capacity units starting at approximately $1,000/month, scaling with number of sources and assets scanned.
For budget planning, a 5,000-user enterprise typically spends $200K-$400K annually on Purview licensing, plus $75K-$150K for implementation consulting, and $50K-$100K annually for ongoing managed governance services. The ROI justification is straightforward: a single GDPR fine averages $4.2 million, and a single data breach averages $4.88 million (IBM 2025 Cost of a Data Breach Report).
Implementation with EPC Group
EPC Group's data governance practice specializes in Microsoft Purview implementations for regulated industries. Our methodology starts with a Data Governance Readiness Assessment that maps your current data estate, identifies compliance gaps, and prioritizes quick wins. We then design and deploy Purview in phases aligned with your compliance calendar, ensuring you have evidence for upcoming audits while building toward comprehensive governance. Our clients in healthcare, financial services, and government consistently achieve 90%+ Compliance Manager scores within six months of implementation.
Frequently Asked Questions
What is Microsoft Purview and how does it differ from Azure Purview?
Microsoft Purview is the unified data governance and compliance platform that merged the former Azure Purview (data catalog, data map, data estate scanning) with Microsoft 365 compliance capabilities (DLP, information protection, insider risk, eDiscovery, records management). The rebrand happened in April 2022. Today, Microsoft Purview provides a single pane of glass for governing data across Azure, Microsoft 365, on-premises SQL Server, AWS S3, and other multi-cloud sources. Licensing splits into two tracks: Purview Data Governance (Azure-side scanning and cataloging) and Purview Compliance (Microsoft 365 E5 or E5 Compliance add-on).
What are the core components of Microsoft Purview?
Microsoft Purview includes eight core components: (1) Data Map for automated scanning and classification of data sources across multi-cloud and on-premises environments, (2) Data Catalog for business-friendly data discovery with glossary terms and lineage, (3) Data Loss Prevention (DLP) for preventing sensitive data leakage across Exchange, SharePoint, Teams, and endpoints, (4) Information Protection for sensitivity labels and encryption, (5) Insider Risk Management for detecting risky user behavior, (6) eDiscovery for legal hold and content search, (7) Compliance Manager for assessment tracking against 350+ regulatory templates, and (8) Records Management for retention labels and disposition review.
How much does Microsoft Purview cost for an enterprise?
Microsoft Purview compliance features require Microsoft 365 E5 ($57/user/month) or the E5 Compliance add-on ($12/user/month added to E3). For a 5,000-user enterprise, compliance costs range from $60K to $285K annually depending on the license tier. Purview Data Governance (Azure-side catalog and scanning) uses capacity-based pricing starting at roughly $1,000/month for the standard tier, scaling with the number of data sources scanned and the volume of assets cataloged. Most enterprises spend $150K-$500K annually across both tracks.
How long does a Microsoft Purview implementation take?
A phased Purview implementation typically takes 12-20 weeks. Phase 1 (weeks 1-4) covers data discovery and classification — deploying the data map, connecting sources, and running initial scans. Phase 2 (weeks 5-10) addresses information protection — designing sensitivity labels, configuring DLP policies in monitor mode, and training users. Phase 3 (weeks 11-16) tackles compliance workflows — eDiscovery configuration, insider risk policies, and records management. Phase 4 (weeks 17-20) handles optimization — tuning false positives, configuring alerts, and training compliance officers. Organizations with heavy regulatory requirements (HIPAA, GDPR, SOX) should plan for 24+ weeks.
Can Microsoft Purview scan non-Microsoft data sources?
Yes. Microsoft Purview Data Map supports 100+ data source connectors including AWS S3, Amazon RDS, Google BigQuery, Snowflake, Oracle, SAP HANA, Teradata, Cassandra, on-premises SQL Server, and file shares. Scanning uses self-hosted integration runtimes for on-premises and private network sources. Each scan discovers assets, classifies sensitive data using 300+ built-in classifiers (PII, financial data, healthcare identifiers), and maps data lineage across ETL pipelines. This multi-cloud capability makes Purview the governance layer for hybrid data estates, not just Microsoft workloads.
Ready to Implement Microsoft Purview?
EPC Group helps enterprise organizations design and deploy Microsoft Purview for data governance, compliance, and information protection across multi-cloud environments.
Schedule a Data Governance AssessmentErrin O'Connor
CEO & Chief AI Architect at EPC Group | 28+ years Microsoft consulting