Fabric Data Engineering: Lakehouse Guide 2026
Expert Insight from Errin O'Connor
28+ years Microsoft consulting | 4x Microsoft Press bestselling author | Former NASA Lead Architect | 50+ enterprise Microsoft Fabric implementations with lakehouse architectures processing PB-scale data
Quick Answer
Microsoft Fabric unifies data engineering, data science, data warehousing, and business intelligence on a single platform built on OneLake, a shared data lake storing everything in open Delta Lake format. The enterprise data engineering architecture uses the medallion pattern: Bronze lakehouses ingest raw data via data pipelines and Dataflows Gen2, Silver lakehouses apply business rules through Spark notebooks, and Gold lakehouses provide business-ready dimensional models consumed by Power BI Direct Lake datasets for zero-copy analytics. Organizations migrating from Azure Synapse to Fabric achieve 30-40% TCO reduction while eliminating data silos, and OneLake shortcuts enable federation with existing Azure, AWS, and GCP data lakes without data movement.
Table of Contents
Microsoft Fabric Platform Overview
Microsoft Fabric represents the most significant architectural shift in the Microsoft data platform since the introduction of Azure Synapse Analytics. While Synapse attempted to unify data services by bundling separate products (SQL Pools, Spark Pools, Pipelines, Data Explorer) under a single management plane, each component maintained its own storage, security model, and billing. Fabric takes a fundamentally different approach: all workloads share a single data foundation (OneLake), a single security model (Fabric workspace security backed by Entra ID), and a single billing model (Fabric Capacity Units).
For enterprise data engineering teams, Fabric eliminates the three biggest productivity killers in modern data platforms: data movement (copying data between storage accounts to make it accessible to different compute engines), infrastructure management (provisioning, scaling, and managing compute clusters), and security fragmentation (maintaining different access controls for different storage and compute layers). In Fabric, a data engineer writes data to a lakehouse table, and a Power BI analyst can immediately query it through Direct Lake mode, a data scientist can access it from a Spark notebook, and a SQL developer can query it via the SQL analytics endpoint, all without any data movement, configuration, or security setup. This is a fundamental paradigm change that our Microsoft Fabric consulting practice helps enterprises adopt.
Strategic Advisory: Fabric Is Microsoft's Future Data Platform
Microsoft has made clear through engineering investments and roadmap communications that Fabric is the strategic data analytics platform going forward. Azure Synapse Analytics workspaces are in maintenance mode, with new features being developed exclusively for Fabric. Organizations currently on Synapse should begin Fabric migration planning now. Organizations starting new data platform initiatives should build on Fabric from day one. Delaying Fabric adoption increases technical debt and reduces access to the latest capabilities (Direct Lake, OneLake shortcuts, Copilot in Fabric).
OneLake: The Unified Data Foundation
OneLake is the single, tenant-wide data lake that stores all Fabric data. Every lakehouse, every warehouse, and every KQL database in Fabric stores its data in OneLake. There is no option to store data elsewhere, and there is no configuration needed. OneLake is provisioned automatically when Fabric is activated for a tenant.
OneLake Architecture
- One copy of data: Data stored in OneLake is accessible to all Fabric workloads without copying. A Delta table in a lakehouse is simultaneously queryable via Spark (data engineering), T-SQL (SQL analytics endpoint), and Power BI (Direct Lake mode). This eliminates the data movement tax that plagues traditional architectures
- Open Delta Lake format: All managed tables in OneLake use the Delta Lake format (Parquet files with a JSON transaction log). Delta provides ACID transactions, time travel (query historical versions of data), schema evolution, and efficient upserts. The open format means data is not locked into a proprietary system
- OneLake explorer: OneLake is accessible via Windows Explorer (OneLake file explorer), Azure Storage APIs, and the Fabric portal. Data engineers can browse OneLake like a file system, copy files from local machines, and integrate with any tool that supports Azure Blob Storage or ADLS Gen2 APIs
- Automatic optimization: OneLake applies V-Order optimization to Parquet files, enabling faster read performance across all compute engines. V-Order is a special sorting and encoding technique developed by Microsoft that provides 10-50% faster query performance compared to standard Parquet files
OneLake Shortcuts
Shortcuts are virtual pointers that make external data appear as if it is part of a Fabric lakehouse. EPC Group uses shortcuts extensively for enterprise implementations where organizations need to access data across organizational boundaries or from existing data lakes without migration.
- Cross-workspace shortcuts: Reference data in another Fabric workspace. Enables departmental data sharing without data duplication. The finance team's lakehouse can shortcut the sales team's customer dimension table
- ADLS Gen2 shortcuts: Reference data in existing Azure Data Lake Storage Gen2 accounts. Enables Fabric analytics on data that remains in its original Azure storage. No data movement, no storage duplication
- Amazon S3 shortcuts: Reference data stored in AWS S3 buckets. Enables cross-cloud analytics where Fabric queries data in AWS without copying it to Azure. EPC Group uses S3 shortcuts for clients with multi-cloud data estates
- Google Cloud Storage shortcuts: Reference data in GCP storage. Completes the multi-cloud federation story, enabling Fabric as the unified analytics layer across Azure, AWS, and GCP
Fabric Lakehouse Architecture
The Fabric lakehouse is the primary data engineering artifact. It provides a dual interface: a file-based interface (for landing raw files, Parquet, CSV, JSON) and a table-based interface (managed Delta tables with SQL analytics endpoint).
Lakehouse Components
- Files section: Unmanaged storage for landing raw files in any format (CSV, JSON, Parquet, images, PDFs). Data in the Files section is not automatically registered as tables and requires Spark processing to transform into managed tables. Use for: raw data landing zone, external data drops, unstructured data storage
- Tables section: Managed Delta tables with full SQL analytics endpoint support. Tables are automatically registered in the lakehouse metastore and are queryable via T-SQL, Spark SQL, and Power BI Direct Lake. Use for: all structured, curated data that will be consumed by analytics workloads
- SQL analytics endpoint: Automatically provisioned read-only T-SQL endpoint for every lakehouse. Business analysts and SQL developers can query lakehouse tables using familiar T-SQL syntax, create views and functions, and connect from any SQL client (SSMS, Azure Data Studio, Python, etc.). The endpoint does not support DML operations (INSERT, UPDATE, DELETE); data modification is done through Spark
- Default semantic model: Every lakehouse automatically generates a Power BI semantic model (dataset) that reflects the lakehouse tables. This enables immediate Direct Lake reporting without explicit dataset creation
Spark Notebooks for Data Engineering
Fabric Spark notebooks are the primary tool for data transformation, data quality processing, and complex data engineering logic. They support PySpark (Python), Spark SQL, Scala, and R, with PySpark being the most common enterprise choice due to its extensive library ecosystem and accessibility to data engineers and data scientists alike.
Enterprise Notebook Best Practices
- Parameterized notebooks: Use notebook parameters for all configuration values (source lakehouse name, target table name, processing date, filter criteria). Parameterized notebooks are reusable across environments (Dev, Test, Production) and can be invoked by data pipelines with different parameter values for each run
- Modular notebook design: Structure notebooks as focused, single-responsibility units. One notebook for Bronze ingestion, one for Silver transformation per domain, one for Gold aggregation. Avoid monolithic notebooks with 200+ cells that are impossible to test and maintain
- Error handling and logging: Implement try/except blocks around every significant operation. Log processing metrics (rows read, rows written, rows rejected, processing duration) to a dedicated audit table. Configure notebook exit values to communicate status to orchestrating pipelines
- Delta Lake best practices: Use MERGE operations (upserts) instead of full table overwrites for incremental processing. Enable optimizeWrite for automatic file sizing during writes. Schedule OPTIMIZE and VACUUM operations weekly to maintain query performance. Use liquid clustering (replacing partitioning) for optimal query performance on modern Fabric workloads
- Lakehouse references: Use the Fabric lakehouse API and abfss paths for all data access. Reference lakehouses by name through the Fabric notebook lakehouse picker, which handles path resolution automatically. Avoid hardcoding storage account names or paths
Data Pipelines and Orchestration
Fabric data pipelines orchestrate the execution of notebooks, Dataflows Gen2, stored procedures, and copy activities into automated, scheduled workflows. They are the orchestration layer that makes the medallion architecture operational.
- Copy activity: Moves data from 150+ supported sources into OneLake lakehouse tables or files. Supports full and incremental copy patterns, parallel copy for large datasets, and fault tolerance with automatic retry. Use for: ingesting data from SQL databases, REST APIs, file systems, and cloud storage into the Bronze layer
- Notebook activity: Executes a Spark notebook with specified parameters. The notebook runs on Fabric Spark compute within your capacity. Use for: Silver and Gold layer transformations, data quality checks, and complex business logic
- Dataflow Gen2 activity: Executes a Dataflows Gen2 transformation within the pipeline. Use for: Power Query-based transformations that business analysts maintain, simple ETL operations that do not require Spark
- ForEach activity: Iterates over a collection and executes inner activities for each item. Use for: processing multiple source tables with the same pattern, iterating over a list of dates for backfill operations
- Dependency management: Activities within a pipeline can have dependencies (success, failure, completion, skipped). Configure dependencies to ensure Bronze activities complete before Silver activities begin, and Silver completes before Gold. Use On Failure paths to trigger alerting and cleanup activities
Enterprise Pipeline Patterns
EPC Group implements standardized pipeline patterns for enterprise clients. The master orchestration pipeline runs on schedule (typically daily at 2 AM), triggers Bronze ingestion pipelines in parallel for all source systems, waits for all Bronze pipelines to complete, triggers Silver transformation pipelines, waits for Silver completion, triggers Gold aggregation pipelines, and finally triggers Power BI semantic model refresh. Each layer has error handling: if any pipeline fails, the orchestrator sends alerts via email and Teams, logs the failure to an audit table, and does not proceed to the next layer to prevent corrupted data from propagating.
Medallion Architecture Implementation
The medallion architecture is the enterprise data engineering standard in Fabric, and EPC Group implements it for every production deployment. The three layers map directly to Fabric lakehouses and provide a clear data quality progression from raw to business-ready.
Bronze Layer: Raw Data Ingestion
- Purpose: Faithful copy of source system data with minimal transformation. Preserves the source of truth for auditability and reprocessing
- Implementation: One lakehouse per source system domain (e.g., lh_bronze_erp, lh_bronze_crm, lh_bronze_hris). Tables mirror source system table names with added ingestion metadata columns (_ingested_at, _source_system, _batch_id)
- Ingestion patterns: Full load for small reference tables (under 100K rows), incremental load using watermark columns (ModifiedDate, ChangeTrackingVersion) for transactional tables, Change Data Capture for real-time or near-real-time requirements
- Data retention: Bronze tables retain all historical data with Delta time travel enabling point-in-time recovery. VACUUM is configured with a 30-day retention to support reprocessing scenarios
Silver Layer: Business Rules and Quality
- Purpose: Cleaned, deduplicated, standardized, and business-rule-enriched data ready for domain-specific analytics
- Implementation: One lakehouse per business domain (e.g., lh_silver_sales, lh_silver_finance, lh_silver_hr). Tables follow consistent naming: dim_customer, dim_product, fact_sales, fact_transactions
- Transformations: Data type standardization (consistent date formats, numeric precision), deduplication using MERGE with deterministic matching keys, business rule application (revenue recognition logic, customer segmentation rules, status code mappings), cross-source joins (enriching CRM contacts with ERP billing data), and data quality checks (null checks, range validation, referential integrity)
- Quality framework: EPC Group implements automated data quality checks between Bronze and Silver: row count variance checks (alert if row count changes by more than 10%), null percentage checks (alert if critical columns exceed null thresholds), referential integrity checks (alert if foreign keys do not resolve), and freshness checks (alert if source data is older than expected)
Gold Layer: Business-Ready Analytics
- Purpose: Aggregated, pre-computed, dimensionally-modeled data optimized for business intelligence consumption
- Implementation: One lakehouse for enterprise-wide Gold data (lh_gold_analytics) or domain-specific Gold lakehouses for decentralized organizations. Star schema tables optimized for Power BI: fact tables with numeric measures and foreign keys, dimension tables with descriptive attributes, pre-computed aggregation tables for common reporting scenarios
- Direct Lake consumption: Gold lakehouse tables are consumed by Power BI semantic models using Direct Lake mode. Direct Lake reads data directly from Delta files in OneLake without importing into the Power BI engine, providing import-mode performance with DirectQuery-mode freshness. This is the optimal consumption pattern for Fabric and is central to our Power BI consulting services
Fabric Data Warehouse vs. Lakehouse
Fabric offers both lakehouse and data warehouse engines, and understanding when to use each is critical for enterprise architecture decisions.
- Lakehouse strengths: Spark-based data engineering, schema-on-read flexibility, support for unstructured data alongside structured tables, Python/PySpark processing, data science workloads, and cost-effective storage for large data volumes. Best for: data engineering teams building ETL pipelines, data science teams building ML models, organizations with diverse data types
- Warehouse strengths: Full T-SQL DML support (INSERT, UPDATE, DELETE, MERGE via T-SQL), stored procedures, multi-table transactions, and performance-optimized columnstore storage. Best for: SQL-centric teams migrating from SQL Server or Synapse SQL pools, applications requiring T-SQL stored procedures, scenarios needing multi-table ACID transactions via T-SQL
- Hybrid pattern (recommended): Use the lakehouse for data engineering (Bronze and Silver layers processed with Spark) and either lakehouse or warehouse for the Gold layer depending on team skills. SQL-centric organizations can use a Fabric warehouse for the Gold layer, consuming Silver lakehouse data through cross-database queries. This provides the best of both engines
Real-Time Analytics
For organizations requiring sub-second analytics on streaming data, Fabric Real-Time Analytics provides a dedicated engine optimized for high-velocity time-series data. This is not a replacement for the lakehouse but a complement for specific real-time use cases.
- Eventstreams: Managed event ingestion from Azure Event Hubs, IoT Hub, Kafka, custom applications, and database Change Data Capture. Eventstreams can route events to multiple destinations simultaneously: KQL databases for real-time querying, lakehouses for historical storage, and custom endpoints for event-driven processing
- KQL databases: Kusto Query Language databases optimized for time-series analysis. Ingest millions of events per second with sub-second query latency. KQL provides powerful time-series functions: time-bucket aggregation, anomaly detection, trend analysis, and pattern matching across billions of events
- Real-time dashboards: Purpose-built dashboards for streaming data visualization. Auto-refresh at configurable intervals (down to 10 seconds). Tile-based layout with KQL-powered visualizations for operational monitoring
- Lakehouse integration: Use OneLake shortcuts to make historical KQL database data available in lakehouses for batch analytics. Alternatively, configure Eventstreams to write both to KQL (real-time) and lakehouse (historical), creating a lambda architecture within Fabric
Implementation Roadmap: 12-Week Enterprise Deployment
- Week 1-2: Discovery and Architecture Design. Audit existing data platform (Synapse, Databricks, SQL Server, legacy data warehouse). Map data sources, data volumes, refresh frequencies, and consumer workloads. Design Fabric workspace structure and capacity sizing. Define medallion architecture with lakehouse and warehouse assignments. Plan security model with Entra ID groups and workspace roles
- Week 3-4: Foundation Setup. Provision Fabric capacity (recommend starting with F64 for enterprise). Create workspace hierarchy: Dev, Test, Production with deployment pipelines. Create Bronze, Silver, and Gold lakehouses. Configure OneLake shortcuts for existing data lake integration. Set up Git integration for notebook and pipeline version control
- Week 5-6: Bronze Layer Implementation. Build data pipelines for primary source system ingestion. Configure incremental load patterns with watermark tracking. Implement error handling, logging, and monitoring. Validate data completeness against source systems. Configure pipeline scheduling for daily/hourly refresh
- Week 7-8: Silver Layer Implementation. Develop Spark notebooks for data cleaning and transformation. Implement business rules, deduplication, and cross-source joins. Build automated data quality framework with row count, null, and referential integrity checks. Create Silver lakehouse tables with optimized Delta configurations (liquid clustering, Z-ordering)
- Week 9-10: Gold Layer and Power BI Integration. Build Gold layer with star schema dimensional models. Create Power BI Direct Lake semantic models on Gold lakehouse tables. Migrate existing Power BI reports from import mode to Direct Lake. Validate report accuracy and performance against baselines. This phase integrates closely with our Power BI consulting expertise
- Week 11-12: Governance, Optimization, and Handoff. Implement Microsoft Purview integration for data governance and lineage. Configure capacity auto-scaling and cost monitoring. Conduct performance testing under expected production load. Document operational procedures: monitoring, troubleshooting, scaling. Train data engineering and analytics teams on Fabric operations. Transition to managed services for ongoing optimization and support
Conclusion: Fabric Is the Future of Enterprise Data Engineering
Microsoft Fabric represents a generational shift in how enterprise data platforms are built and operated. The unified architecture of OneLake, the elimination of data movement between workloads, the simplicity of capacity-based billing, and the integration of data engineering, data science, and business intelligence on a single platform make Fabric the most compelling enterprise data platform available today. Organizations that adopt Fabric early gain a significant competitive advantage: faster time to insight, lower operational overhead, and a platform that improves with every Microsoft release.
EPC Group brings 28+ years of Microsoft ecosystem expertise, credentials as a 4x Microsoft Press bestselling author, and proven Fabric architectures refined across 50+ enterprise implementations. Our clients achieve 30-40% reduction in data platform TCO, 50% improvement in data team productivity, and analytics architectures that scale from departmental dashboards to petabyte-scale enterprise data lakes. Schedule a complimentary Fabric Assessment or call us at 1-888-381-9725 to discover how Microsoft Fabric can transform your enterprise data engineering.
Frequently Asked Questions
What is Microsoft Fabric and how does it differ from Azure Synapse Analytics?
Microsoft Fabric is a unified analytics platform that brings together data engineering, data science, data warehousing, real-time analytics, and business intelligence into a single SaaS product built on a shared data foundation called OneLake. While Azure Synapse Analytics was an integration of separate Azure services (Synapse SQL Pools, Spark Pools, Pipelines) each with independent storage and billing, Fabric provides a truly unified experience where all workloads share the same data lake (OneLake), the same security model, the same governance layer, and the same billing meter (Fabric Capacity Units). This means a table created by a data engineer in a lakehouse is immediately queryable by a Power BI analyst through Direct Lake mode without copying or moving data. Fabric also eliminates the infrastructure management complexity of Synapse: no provisioning SQL pools, no managing Spark cluster sizes, no configuring storage accounts. Everything is managed by the Fabric platform and scales automatically within your capacity. EPC Group has migrated 50+ organizations from Azure Synapse to Microsoft Fabric, typically reducing total cost of ownership by 30-40% while improving data platform team productivity by 50%.
What is the Fabric lakehouse and how does it compare to a traditional data warehouse?
The Fabric lakehouse combines the flexibility of a data lake (storing raw files in any format) with the structure and query performance of a data warehouse (SQL-based analysis with ACID transactions). Data is stored in OneLake in open Delta Lake format (Parquet files with a transaction log), which supports both Spark-based processing (Python, Scala, SQL) and T-SQL querying through the SQL analytics endpoint. Compared to a traditional data warehouse, the lakehouse offers: schema-on-read flexibility (store data first, define schema later), support for unstructured data (images, PDFs, logs) alongside structured tables, lower storage costs (OneLake uses Azure Data Lake pricing at ~$0.023/GB/month vs. $5-23/TB/month for dedicated SQL pools), and open format portability (Delta Lake is open-source, not proprietary). The tradeoff is that the lakehouse SQL analytics endpoint is optimized for analytical queries but does not support the full T-SQL surface area (no stored procedures, no triggers). For organizations needing full T-SQL compatibility, Fabric also provides a separate data warehouse engine. EPC Group recommends a hybrid approach for most enterprises: lakehouse for data engineering and staging, warehouse for governed semantic layers and complex T-SQL workloads.
How does OneLake work and what are shortcuts?
OneLake is the unified data lake that underpins all Microsoft Fabric workloads. Think of it as the "OneDrive for data": every Fabric tenant has a single OneLake, and every workspace, lakehouse, and warehouse stores data within it. OneLake is built on Azure Data Lake Storage Gen2 and uses the Delta Lake format for structured tables. All Fabric workloads (Spark, SQL, Power BI, Real-Time Analytics) read from and write to OneLake, eliminating data silos and copy operations. Shortcuts are OneLake virtual pointers to data stored elsewhere, either in another OneLake location, in external Azure Data Lake Storage Gen2, in Amazon S3, or in Google Cloud Storage. Shortcuts appear as tables or folders within a lakehouse but do not copy or move the data. When a Spark notebook or SQL query accesses a shortcut, it reads from the original location in real time. This enables organizations to: (1) Federate data across organizational boundaries without data movement, (2) Access existing data lakes (Azure, AWS, GCP) from Fabric without migration, (3) Share data between Fabric workspaces without duplication. EPC Group uses shortcuts extensively for hybrid architectures where organizations are migrating to Fabric incrementally, enabling Fabric analytics on existing data lake investments without upfront migration.
What is the medallion architecture and how do you implement it in Fabric?
The medallion architecture (Bronze, Silver, Gold) is a data engineering pattern that organizes data by quality and readiness for consumption. Bronze layer stores raw data as ingested from source systems with minimal transformation (preserving the source of truth). Silver layer applies business rules, data quality checks, deduplication, and standardized schemas. Gold layer provides business-ready aggregations, metrics, and dimensional models optimized for reporting and analytics. In Fabric, the medallion architecture is implemented using lakehouses: create separate Bronze, Silver, and Gold lakehouses (or layers within a single lakehouse using Delta table naming conventions). Bronze ingestion uses Fabric data pipelines or Dataflows Gen2 to land raw data as Delta tables. Silver transformation uses Spark notebooks to apply cleaning, deduplication, joins, and business rules, writing results to Silver Delta tables. Gold aggregation uses Spark notebooks or SQL views to create star schemas, pre-computed metrics, and business-ready tables consumed by Power BI Direct Lake datasets. EPC Group implements the medallion architecture for every enterprise Fabric deployment, with standardized notebook templates, automated quality checks between layers, and lineage tracking through Unity Catalog or Purview.
How do Fabric data pipelines compare to Azure Data Factory?
Fabric data pipelines are the evolution of Azure Data Factory (ADF) within the Fabric platform. They share the same visual pipeline designer, the same activity types (Copy, Dataflow, ForEach, If Condition, Web, etc.), and the same integration runtime architecture. If you know ADF, you already know Fabric pipelines. The key differences are: (1) Native Fabric integration: Fabric pipelines natively orchestrate Fabric activities including Spark notebooks, Dataflows Gen2, SQL scripts, and semantic model refreshes without requiring linked services or external connections. (2) OneLake as default storage: Copy activities write directly to OneLake lakehouse tables without configuring storage linked services. (3) Simplified monitoring: Pipeline runs are visible in the Fabric monitoring hub alongside all other workload executions. (4) Capacity-based billing: Pipeline execution consumes Fabric Capacity Units rather than separate ADF billing meters, simplifying cost management. (5) No self-hosted integration runtime management for cloud sources: Fabric handles connectivity to Azure, Microsoft 365, and 150+ cloud data sources natively. For existing ADF users, migration to Fabric pipelines is straightforward: EPC Group has migrated 100+ ADF pipelines to Fabric for enterprise clients, typically completing migration in 2-4 weeks with zero downtime.
What Fabric capacity size does an enterprise need?
Fabric capacity is measured in Capacity Units (CUs) which provide compute power for all Fabric workloads. The minimum capacity is F2 (2 CUs, approximately $260/month) suitable for proof-of-concept and small team development. For enterprise production workloads, EPC Group recommends starting with F64 (64 CUs, approximately $8,300/month) which supports: 20-30 concurrent data pipeline executions, 10-15 concurrent Spark notebook sessions, 50-100 concurrent Power BI Direct Lake queries, and daily processing of 500GB-1TB of incremental data. For large enterprises with heavy Spark workloads, F128 (approximately $16,600/month) or F256 (approximately $33,200/month) provides additional concurrency and faster processing. The key capacity sizing factors are: concurrent workloads (how many pipelines, notebooks, and queries run simultaneously), data volume (GB/TB processed daily), query complexity (simple aggregations vs. complex joins and ML workloads), and user count (number of concurrent Power BI viewers hitting Direct Lake datasets). Fabric supports capacity auto-scaling and bursting, allowing temporary capacity increases for peak processing windows. EPC Group conducts capacity sizing assessments for enterprise clients, right-sizing capacity to workload patterns and configuring auto-scale policies to minimize cost while ensuring performance SLAs.
How does real-time analytics work in Microsoft Fabric?
Fabric Real-Time Analytics (formerly Azure Data Explorer/Kusto in Fabric) provides sub-second query performance on streaming and time-series data. It uses KQL (Kusto Query Language) databases optimized for high-velocity data ingestion (millions of events per second) and real-time querying. The architecture includes: Eventstreams for ingesting real-time data from Azure Event Hubs, Azure IoT Hub, Kafka, custom applications, and Change Data Capture from databases. KQL databases for storing and querying time-series data with sub-second latency. Real-Time dashboards for visualizing streaming data with automatic refresh. Integration with lakehouses through OneLake shortcuts, enabling historical analysis of real-time data alongside batch-processed data. Enterprise use cases include: IoT telemetry monitoring (manufacturing, energy, healthcare devices), application performance monitoring (web app logs, API latency, error rates), security event analysis (SIEM data, network logs, authentication events), and financial market data analysis (trade execution, price feeds, risk metrics). EPC Group implements real-time analytics for enterprises needing sub-second insights from high-velocity data, with event processing pipelines handling 1M+ events per second and query response times under 500 milliseconds.
About Errin O'Connor
CEO & Chief AI Architect, EPC Group
Errin O'Connor is the founder and Chief AI Architect of EPC Group, bringing 28+ years of Microsoft ecosystem expertise. As a 4x Microsoft Press bestselling author and former NASA Lead Architect, Errin has led 50+ enterprise Microsoft Fabric implementations with lakehouse architectures processing petabyte-scale data across healthcare, finance, and government sectors.
Learn more about Errin