The six Microsoft Purview Data Catalog components — what each does and who owns it
Microsoft Purview Data Catalog is one product surface that spans six tightly integrated components — the Data Map scan plane, the business glossary, the Data Domain federation primitive, Data Products, the steward and owner accountability fabric, and AI-driven classification including Copilot in Purview. Each component has its own owner archetype; deployment success depends on naming a human accountable for each before go-live.
Purview Data Map — automated scan across 70+ source types
What it does: Data Map is the scan-and-catalog plane. Connectors cover Microsoft Fabric OneLake, Azure SQL, Azure Synapse, Azure Data Lake Storage Gen2, Azure Cosmos DB, Azure Database for PostgreSQL and MySQL, Power BI workspaces, Dataverse, on-premises SQL Server, Oracle, Teradata, SAP S/4HANA, SAP HANA, SAP ECC, Snowflake, Databricks Unity Catalog, Amazon S3, AWS RDS, Amazon Redshift, Google BigQuery, Google Cloud Storage, MongoDB Atlas, Salesforce, ServiceNow, Hive Metastore, Looker, Tableau, Erwin, and 40+ other sources. Scans run on schedule, classify against 200+ system-defined patterns plus customer-authored custom classifiers, and emit technical metadata plus end-to-end lineage to the unified catalog.
- Capacity-based provisioning sized to source-system footprint — start at 1 capacity unit, scale on consumption
- Self-hosted integration runtime (SHIR) for on-premises and VNet-isolated source scanning
- Incremental scans on schedule — full scan weekly, delta scan daily, deep classification monthly
- 200+ system-defined sensitive information types plus custom regex and dictionary classifiers
- Multi-cloud connectors for Snowflake, Databricks Unity Catalog, BigQuery, Redshift, S3, and Cloud Storage
Owner archetype: Data platform owner, catalog admin, regulatory compliance officer
Business glossary — semantic layer above the catalog
What it does: The business glossary is the semantic layer above the technical catalog. Glossary terms — Customer, Patient, MNPI, PHI, Net Revenue, Loss-Given-Default — carry definitions, parent-child hierarchies, acronyms, related terms, stewards, owners, expert reviewers, and approval workflow status. Terms bind to physical assets (tables, columns, files, Power BI semantic-model fields) and bind upward to data domains. This is the layer a business stakeholder browses; the technical catalog is the layer an engineer browses.
- Bulk import via CSV from existing glossaries — ASUG, DAMA, customer-authored taxonomies
- Term-to-asset binding for tables, columns, files, Power BI semantic-model fields, Fabric items
- Multi-stage approval workflow — author → steward → expert → owner → published
- Synonyms, acronyms, related terms, parent-child hierarchy, and contextual definitions
- Glossary adoption metrics — terms defined, terms bound, terms in steward backlog
Owner archetype: Data steward, business analyst, regulatory subject-matter expert
Data Domain — federated catalog ownership by business unit
What it does: Data Domain is the federated organizing unit inside Purview Unified Catalog. Each domain (Finance, Sales, Clinical, R&D, Supply Chain) has its own owner, stewards, data products, glossary terms, and policies. Domains are the data-mesh primitive — they let one Purview tenant host multiple business-unit catalogs without forcing a single global glossary or a single global ownership model. This is the answer for an enterprise that has tried and failed at a top-down monolithic catalog.
- Domain-scoped glossary, data products, policies, and stewardship
- Domain owner approval workflows for cross-domain term publication
- Per-domain role-based access — domain admin, data product owner, steward, reader
- Federated discovery — searches surface results across all domains the user has read access to
- Foundation for the Pattern 3 data-mesh deployment described below
Owner archetype: Chief Data Officer, business-unit data leader, data mesh architect
Data Products — curated, discoverable, business-grade data assets
What it does: A Data Product is a curated bundle of assets (tables, columns, Power BI semantic models, dataflows, files, KQL queries) packaged as a single discoverable unit with documentation, SLAs, owner, quality metrics, and access-request workflow. Where the Data Map answers "what data exists?" Data Products answer "what data can I rely on?" Discovery, request access, and consumption all flow through the Data Product, not the raw asset.
- Documentation, sample queries, schema contracts, SLAs, freshness commitments — all attached to the product
- Owner-approved access requests routed through Purview policy and provisioned in source systems
- Quality scorecards from Microsoft Purview Data Quality engine bound to the product
- Product version history and deprecation lifecycle with reader-impact notification
- Foundation for self-service analytics — analysts shop the catalog, request the product, and consume
Owner archetype: Data product owner, analytics consumer, citizen data scientist
Steward + Owner workflows — accountability fabric
What it does: Steward and Owner are the accountability roles that turn the catalog from a metadata dump into a managed asset. Owners hold business accountability for a data product, domain, or glossary term; stewards hold operational accountability for quality, classification accuracy, lineage completeness, and access decisions. Purview routes approval requests, quality alerts, classification disputes, and access requests through these roles via in-product workflow and Microsoft Teams notifications. This is the layer that fixes the "catalog rots after launch" failure pattern.
- Per-asset steward and owner assignment with optional delegate chain for vacation coverage
- Approval workflows for glossary publication, classification change, lineage edit, access grant
- Teams notifications routed to the steward channel with one-click approve / reject / escalate
- Steward dashboard surfacing backlog, SLA breaches, quality alerts, and pending access requests
- Steward performance metrics rolled into the Data Estate Insights executive view
Owner archetype: Data steward, data owner, data office program manager
AI-driven classification + Copilot in Purview
What it does: AI-driven classification combines 200+ system-defined sensitive-information types, machine-learning trainable classifiers (PHI clinical notes, MNPI deal memos, attorney-client privileged content), customer-authored regex and dictionary rules, and natural-language Copilot suggestions for glossary term binding, steward assignment, and classification rule authoring. Copilot in Purview is the natural-language search layer — "find regulated customer-PII columns in the Snowflake estate that lack a steward" — that turns the catalog into a queryable system rather than a navigation tree.
- System-defined classifiers for SSN, NI number, credit card, IBAN, passport, driver license, ICD-10, NPI, MRN
- Trainable classifiers for domain-specific content — PHI clinical notes, MNPI deal memos, CUI markings
- Copilot natural-language search across the entire catalog with citation back to source assets
- Copilot-suggested classifications and term bindings, reviewed by steward, approved or rejected with one click
- Copilot-authored regex starter rules for customer-specific patterns — supplier IDs, member IDs, internal codes
Owner archetype: Catalog admin, classification analyst, data office automation lead