The Data Governance Problem in 2026
Enterprise data sprawl has reached a critical point. The average Fortune 500 organization stores sensitive data across more than 15 platforms. These include:
- Microsoft 365
- Azure SQL
- AWS S3
- Snowflake
- On-premises file shares
- Legacy databases
- SaaS applications
Without centralized governance, organizations face significant risks. These risks include:
- Regulatory fines, with GDPR penalties expected to exceed $4.2 billion through 2025.
- Data breaches caused by misconfigured permissions.
- Inconsistent classification that complicates legal discovery.
- Shadow data proliferation that is hard to audit.
Microsoft Purview offers a unified control plane for data governance and compliance across your entire data estate.
It is more than just a Microsoft 365 tool. It scans and manages data no matter where it resides, including on competing cloud platforms.
Microsoft Purview Architecture Overview
Microsoft Purview operates across two complementary planes that serve different but overlapping audiences:
Data Governance Plane (formerly Azure Purview) serves data engineers, data stewards, and chief data officers. It includes several key features:
- Data Map: Automated scanning and classification of data sources.
- Data Catalog: Business-friendly data discovery with glossary terms.
- Data Estate Insights: Health dashboards across the estate.
- Data Lineage: Tracing how data flows through ETL pipelines from source to destination.
This plane connects to over 100 data sources using built-in and custom connectors.
Compliance Plane (formerly Microsoft 365 Compliance Center) serves compliance officers, legal teams, information security analysts, and HR. It includes several key features:
- Data Loss Prevention (DLP)
- Information Protection (sensitivity labels and encryption)
- Insider Risk Management
- eDiscovery and Audit
- Compliance Manager (regulatory assessment tracking)
- Communication Compliance
- Records Management
This plane mainly operates within the Microsoft 365 ecosystem. It also extends to endpoints and third-party cloud apps through Microsoft Defender for Cloud Apps.
Both planes use a shared classification engine. This engine includes over 300 built-in sensitive information types, such as:
- SSNs
- Credit card numbers
- Medical record numbers
- Passport numbers
Additionally, organizations can create custom trainable classifiers using their own data samples.
Data Map and Automated Classification
The Data Map is the foundation of Purview's governance capabilities. It provides automated scanning that discovers, classifies, and catalogs data assets across your entire estate.
Supported Data Sources
| Category | Sources | Scan Method |
|---|---|---|
| Azure Native | Azure SQL, Synapse, Data Lake, Cosmos DB, Blob Storage | Managed (no agent) |
| AWS | S3, RDS, Glue, Redshift | Self-hosted runtime |
| GCP | BigQuery, Cloud Storage | Self-hosted runtime |
| On-Premises | SQL Server, Oracle, SAP HANA, file shares | Self-hosted runtime |
| SaaS | Snowflake, Teradata, Databricks, Power BI | Managed connector |
Each scan carries out three main tasks:
- Asset discovery: This identifies tables, files, and columns.
- Classification: This applies sensitive information types using pattern matching and machine learning.
- Lineage extraction: This maps data flow through Data Factory, Synapse pipelines, and other ETL tools.
Scans operate on configurable schedules. Typically, enterprises run weekly full scans along with daily incremental scans.
Classification Engine
Purview's classification engine uses two main techniques. First, it employs exact data matching (EDM) to validate against real datasets, such as employee SSN tables. Second, it utilizes pattern-based matching, which includes:
- Regular expressions
- Keyword searches
- Machine learning models
- Regular expressions
- Keyword searches
- Machine learning models
- Regex and checksum validation
- Keyword proximity detection, such as finding "patient" near a number pattern to identify medical record numbers
- Trainable classifiers created from 25-50 positive samples of your organization's sensitive document types
Custom trainable classifiers are especially useful for industry-specific data. This includes items like insurance claim numbers, internal project codes, or proprietary identifiers that standard classifiers do not cover.
Data Loss Prevention (DLP)
DLP policies stop sensitive data from leaving secure environments. This includes:
- SharePoint sharing
- Teams messages
- OneDrive sync
- USB devices
- Clipboard actions
- Printing
Implementing Enterprise DLP needs a strategic approach. It is not just about enabling policies.
DLP Policy Architecture
Structure DLP policies in three tiers to balance protection with user productivity:
- Tier 1: Block with Override — High-volume, moderate-sensitivity detections. Users see a policy tip and can override with a business justification. Examples: sharing documents containing 1-9 credit card numbers, emailing files with PII to external recipients.
- Tier 2: Block Without Override — High-sensitivity detections. No user override is possible, and a compliance alert is generated. Examples: bulk SSN exfiltration (10+ records), sharing documents marked Highly Confidential externally.
- Tier 3: Endpoint DLP — Device-level controls preventing copy-to-USB, print, upload to unsanctioned cloud apps, and clipboard copy of sensitive content. Requires Microsoft Defender for Endpoint onboarding.
Always deploy DLP policies in "Test with policy tips" mode for 2-4 weeks before enforcement. Analyze false positive rates during testing.
If the false positive rate exceeds 5%, the policy needs adjustment. Consider the following options:
- Narrow the sensitive information type.
- Add exceptions for specific user groups.
- Increase the confidence threshold from "medium" to "high."
Information Protection and Sensitivity Labels
Sensitivity labels are crucial for classifying data. They stay with the document and control access, no matter where the file goes. For example, a document labeled "Confidential — Internal Only" remains encrypted and restricted, even if it is accidentally shared outside the organization.
Label Taxonomy Design
Design your label hierarchy to match your organization's data classification policy. A proven enterprise taxonomy includes:
- Public — No restrictions. Marketing materials, press releases, public website content.
- General — Internal use. No encryption. Visual marking (header/footer). Default label for most business documents.
- Confidential — Encrypted. Sublabels for "All Employees" (organization-wide access), "Specific People" (named recipients), and "Recipients Only" (forwarding disabled).
- Highly Confidential — Encrypted with restricted access. Sublabels for "Board/Executive" (board members only), "Legal Privileged" (legal team only), and "Regulatory" (compliance team only).
Enable auto-labeling policies to automatically apply sensitivity labels based on specific conditions. For instance, any document containing 5 or more patient health records can be labeled as Highly Confidential — Regulatory.
Auto-labeling helps reduce the burden on end users. It also ensures consistent protection, even if users forget to label documents manually.
Insider Risk Management
Insider Risk Management uses machine learning to analyze user activities and spot risky behavior patterns. It combines signals from:
- Microsoft 365 (file access, email, Teams)
- Microsoft Defender for Endpoint (device activity)
- HR connectors (resignation dates, performance improvement plans)
Key policy templates focus on several important areas:
- Data theft by departing users: This is triggered by an HR resignation signal combined with increased file downloads.
- Data leaks: These involve detecting unusual sharing or high volumes of external emails.
- Security policy violations: This includes visiting risky websites and disabling security tools.
- Patient data access violations: This is healthcare-specific and monitors EHR access patterns against job responsibilities.
Privacy is essential in our design. By default, analyst views are pseudonymized. Investigators see "User-7291" instead of the employee's name. Only a compliance manager with the right RBAC permissions can approve an escalation decision.
This approach helps to:
- Prevent casual browsing of employee activities
- Allow for legitimate investigations
eDiscovery Premium
eDiscovery Premium manages workflows for legal hold, content search, review, and export in litigation and regulatory investigations. The platform processes various data types, including:
- Documents
- Teams chat
- Yammer
- Third-party data via import connectors
Key capabilities include:
- Custodian management: Placing legal holds on specific people's mailboxes and OneDrive sites.
- Advanced indexing: Deep processing of images using OCR and extracting text from PDFs and scanned documents.
- Review sets with machine learning: Predictive coding to prioritize relevant documents, near-duplicate detection, and email threading to reduce review volume by 40-60%.
- Analytics dashboards: Displaying themes, key terms, and communication patterns across custodian data.
For organizations in regulated industries, eDiscovery Premium works with Purview Audit Premium. This integration offers:
- 10-year audit log retention (compared to 1 year for standard)
- High-bandwidth access to the audit log search API for SIEM integration
This extended retention is essential for SEC, FINRA, and HIPAA investigations that may review data from years past.
Compliance Manager
Compliance Manager offers ongoing tracking against over 350 regulatory templates. These include GDPR, HIPAA, SOC 2, ISO 27001, NIST 800-53, FedRAMP, PCI DSS, and various industry-specific frameworks.
Each assessment evaluates your organization’s compliance status and suggests prioritized actions for improvement.
The compliance score reflects the improvement actions that have been completed. Microsoft-managed actions account for about 50% of the score. These actions are infrastructure controls maintained by Microsoft.
The remaining 50% comes from customer-managed actions. These include configurations and policies that your organization controls.
Each action is weighted according to its importance:
- Encryption controls carry more weight than documentation controls.
Create a compliance dashboard that connects your regulatory obligations to Purview controls. For healthcare organizations, this involves:
- Mapping HIPAA Administrative Safeguards to information protection policies
- Linking Technical Safeguards to DLP and encryption
- Aligning Physical Safeguards with endpoint DLP and device management
This mapping shows auditors that controls are not only established but also actively monitored.
Records Management
Records Management ensures proper retention and disposal of content throughout its lifecycle. Retention labels specify:
- How long content must be kept
- What occurs when that period ends
- Options to delete automatically, initiate a disposition review, or mark as a regulatory record that cannot be deleted
Key implementation considerations for enterprise records management include:
- File plan management for importing existing retention schedules from Excel.
- Event-based retention, which starts the retention clock when a contract expires or an employee departs, rather than from the creation date.
- Regulatory records that prevent any modification or deletion, even by administrators.
- Multi-stage retention for documents that move through active, semi-active, and archive phases, each with different storage and access policies.
You can auto-apply retention labels using trainable classifiers, keyword queries, or sensitive information types. This helps reduce the need for manual labeling.
For instance:
- Automatically apply a 7-year retention label to any document classified as containing financial records.
- Automatically apply a 10-year label to documents containing patient health information.
Implementation Roadmap
A successful Purview implementation follows a phased approach that minimizes user disruption while building governance capabilities incrementally:
Phase 1: Discovery and Classification (Weeks 1-4)
- Deploy the Data Map and connect all primary data sources
- Run initial full scans across Azure, Microsoft 365, and on-premises
- Review classification results and tune custom classifiers
- Build the data glossary with business-friendly terms
- Assign data stewards for each major data domain
Phase 2: Information Protection (Weeks 5-10)
- Design and publish sensitivity label taxonomy
- Configure DLP policies in test/monitor mode
- Deploy endpoint DLP to managed devices
- Train end users on labeling expectations
- Analyze DLP alerts and tune false positive rates
Phase 3: Compliance Workflows (Weeks 11-16)
- Configure eDiscovery cases and legal hold templates
- Deploy insider risk management policies
- Implement records management retention labels
- Set up Compliance Manager assessments for target regulations
- Configure communication compliance for regulated channels
Phase 4: Optimization (Weeks 17-20)
- Enforce DLP policies (move from test to block)
- Enable auto-labeling based on Phase 2 analysis
- Build executive compliance dashboards in Power BI
- Conduct compliance officer training and tabletop exercises
- Document runbooks for common compliance workflows
Industry-Specific Compliance Mapping
| Regulation | Purview Components | Key Controls |
|---|---|---|
| HIPAA | DLP, Information Protection, eDiscovery, Audit Premium | PHI detection, encryption at rest/transit, access logging, 6-year retention |
| GDPR | Data Map, DLP, Information Protection, Records Management | Data subject rights, consent tracking, cross-border transfer controls, right to erasure |
| SOC 2 | Compliance Manager, Insider Risk, DLP, Audit | Access controls, change management, incident response, continuous monitoring |
| FedRAMP | Information Protection, DLP, Records Management, Audit Premium | FIPS 140-2 encryption, CUI handling, 10-year audit retention, incident reporting |
| PCI DSS | DLP, Information Protection, Data Map | Cardholder data detection, network segmentation validation, access monitoring |
Common Pitfalls and How to Avoid Them
Enterprise Purview implementations fail most often due to these preventable mistakes:
- Deploying DLP in enforcement mode immediately — Always run in test mode for 2-4 weeks. Aggressive initial enforcement generates helpdesk tickets, user resentment, and executive pushback that can derail the entire program.
- Creating too many sensitivity labels — Users ignore complex taxonomies. Limit to 4-5 parent labels with 2-3 sublabels each. More than 15 total labels guarantees inconsistent adoption.
- Ignoring the data steward model — Data governance fails without business ownership. Assign stewards for each data domain (HR data, financial data, customer data) who review classifications, manage glossary terms, and approve access requests.
- Skipping change management — Purview changes daily workflows. Invest in training, quick reference guides, and executive sponsorship communications before go-live.
- Not integrating with SIEM — Purview generates alerts that must flow into your security operations center. Configure the Management Activity API or use the Microsoft Sentinel connector to ensure compliance alerts are triaged alongside security events.
Licensing and Cost Planning
Microsoft Purview licensing is complex. Here is a practical breakdown:
- Microsoft 365 E3 ($36/user/month) — Basic DLP (Exchange and SharePoint only), basic audit (180-day retention), manual sensitivity labels, basic records management.
- Microsoft 365 E5 ($57/user/month) — Full DLP (endpoint, Teams, third-party apps), auto-labeling, insider risk management, eDiscovery Premium, advanced audit (1-year retention), communication compliance.
- E5 Compliance add-on ($12/user/month on top of E3) — All E5 compliance features without the E5 security and voice features. Best option if you only need compliance capabilities.
- Purview Data Governance (Azure consumption) — Capacity units starting at approximately $1,000/month, scaling with number of sources and assets scanned.
For budget planning, a 5,000-user enterprise usually spends:
- $200K-$400K annually on Purview licensing
- $75K-$150K for implementation consulting
- $50K-$100K each year for ongoing managed governance services
The ROI justification is clear. A single GDPR fine averages $4.2 million. Additionally, a single data breach costs an average of $4.88 million.
This information comes from the IBM 2025 Cost of a Data Breach Report.
Implementation with EPC Group
EPC Group's data governance practice specializes in Microsoft Purview implementations for regulated industries. Our methodology starts with a Data Governance Readiness Assessment that maps your current data estate, identifies compliance gaps, and prioritizes quick wins. We then design and deploy Purview in phases aligned with your compliance calendar, ensuring you have evidence for upcoming audits while building toward comprehensive governance. Our clients in healthcare, financial services, and government consistently achieve 90%+ Compliance Manager scores within six months of implementation.
Frequently Asked Questions
What is Microsoft Purview and how does it differ from Azure Purview?
Microsoft Purview is the unified data governance and compliance platform that merged the former Azure Purview (data catalog, data map, data estate scanning) with Microsoft 365 compliance capabilities (DLP, information protection, insider risk, eDiscovery, records management). The rebrand happened in April 2022. Today, Microsoft Purview provides a single pane of glass for governing data across Azure, Microsoft 365, on-premises SQL Server, AWS S3, and other multi-cloud sources. Licensing splits into two tracks: Purview Data Governance (Azure-side scanning and cataloging) and Purview Compliance (Microsoft 365 E5 or E5 Compliance add-on).
What are the core components of Microsoft Purview?
Microsoft Purview includes eight core components: (1) Data Map for automated scanning and classification of data sources across multi-cloud and on-premises environments, (2) Data Catalog for business-friendly data discovery with glossary terms and lineage, (3) Data Loss Prevention (DLP) for preventing sensitive data leakage across Exchange, SharePoint, Teams, and endpoints, (4) Information Protection for sensitivity labels and encryption, (5) Insider Risk Management for detecting risky user behavior, (6) eDiscovery for legal hold and content search, (7) Compliance Manager for assessment tracking against 350+ regulatory templates, and (8) Records Management for retention labels and disposition review.
How much does Microsoft Purview cost for an enterprise?
Microsoft Purview compliance features require Microsoft 365 E5 ($57/user/month) or the E5 Compliance add-on ($12/user/month added to E3). For a 5,000-user enterprise, compliance costs range from $60K to $285K annually depending on the license tier. Purview Data Governance (Azure-side catalog and scanning) uses capacity-based pricing starting at roughly $1,000/month for the standard tier, scaling with the number of data sources scanned and the volume of assets cataloged. Most enterprises spend $150K-$500K annually across both tracks.
How long does a Microsoft Purview implementation take?
A phased Purview implementation typically takes 12-20 weeks. Phase 1 (weeks 1-4) covers data discovery and classification — deploying the data map, connecting sources, and running initial scans. Phase 2 (weeks 5-10) addresses information protection — designing sensitivity labels, configuring DLP policies in monitor mode, and training users. Phase 3 (weeks 11-16) tackles compliance workflows — eDiscovery configuration, insider risk policies, and records management. Phase 4 (weeks 17-20) handles optimization — tuning false positives, configuring alerts, and training compliance officers. Organizations with heavy regulatory requirements (HIPAA, GDPR, SOX) should plan for 24+ weeks.
Can Microsoft Purview scan non-Microsoft data sources?
Yes. Microsoft Purview Data Map supports 100+ data source connectors including AWS S3, Amazon RDS, Google BigQuery, Snowflake, Oracle, SAP HANA, Teradata, Cassandra, on-premises SQL Server, and file shares. Scanning uses self-hosted integration runtimes for on-premises and private network sources. Each scan discovers assets, classifies sensitive data using 300+ built-in classifiers (PII, financial data, healthcare identifiers), and maps data lineage across ETL pipelines. This multi-cloud capability makes Purview the governance layer for hybrid data estates, not just Microsoft workloads.
Ready to Implement Microsoft Purview?
EPC Group helps enterprise organizations design and deploy Microsoft Purview for data governance, compliance, and information protection across multi-cloud environments.
Schedule a Data Governance AssessmentErrin O'Connor
CEO & Chief AI Architect at EPC Group | 29 years Microsoft consulting
