Fabric Adoption Framework: Accelerating Data Platform Onboarding

Back to Case Studies

A Microsoft Fabric-based framework that accelerates enterprise data platform adoption with standardized processes and reusable patterns. This scalable solution streamlines onboarding by 4x while enabling consistent governance and implementation across teams, providing a blueprint for successful Microsoft Fabric deployments.

August 23, 2025

14 min read

4 team members

Intermediate

Industry

Telecom & Media Standards

Confidential

Technologies

Languages, frameworks, and platforms used in this project.

Microsoft Fabric (Lakehouse, Warehouse, Pipelines, Notebooks)

Medallion Architecture (Bronze/Silver/Gold)

Metadata-Driven Orchestration

Observability & Operational Telemetry

DevOps & Git-Based Deployments

Azure Services

Concrete Azure resources and services provisioned.

Microsoft Fabric

Power BI

Microsoft Teams (notifications)

Azure DevOps (Repos & Pipelines)

Azure Monitor / Log Analytics (via Fabric-compatible logging patterns)

Key Challenges

•Unstructured ZIP archives and mixed file types (DOCX, PDF, TXT) required consistent parsing.
•Multiple projects needed a single, reusable orchestration pattern.
•Environment-aware runs (dev/uat/prod) and secrets/paths had to be centralized.
•End-to-end traceability: what ran, what loaded, what failed, and where.
•Governed releases and predictable promotion through UAT to Prod.

Key Outcomes

6 projects

Pipeline Reuse

Framework adopted across multiple business units with minimal customization.

60% reduction

Development Efficiency

Reusable notebooks, pipelines, and utilities minimized development effort and ensured consistency.

4x throughput

Processing Efficiency

Scaled document processing from 500 to 2,000 standards docs/hour with partitioned processing.

Team-wide standards

Consistency

Naming conventions, project template, and repo layout speed onboarding.

Summary

We developed a reusable Microsoft Fabric framework for document ingestion and analytics that scales efficiently across projects and environments. The solution implements a parent-child pipeline architecture with dynamic parameters and environment variables, applying Medallion principles to process unstructured content through Bronze (raw), Silver (processed), and Gold (analytics-ready) layers. With integrated operational telemetry, Git-based deployments, and Power BI semantic refresh capabilities, the framework provides reliable project delivery while maintaining consistent standards and governance across the organization.

Project Highlights

Team-wide framework for Fabric projects (repo structure, naming, dev practices, logging, release strategy)
Parent orchestrator + modular child pipelines with dynamic params and library variables
Medallion processing of unstructured ZIPs and structured datasets
Teams notifications and Power BI semantic refresh
Git-based deployments with protected branches (UAT → Prod)
Metrics & error logs tables for end-to-end observability

The Challenge

Business Limitations

Diverse data sources and file formats required a consistent, reusable ingestion approach.
Stakeholders needed reliable refreshes and timely updates to curated datasets/reports.
Onboarding new projects had to be quick, discoverable, and well-documented.

Technical Hurdles

Orchestrating unzipping, parsing, and normalization for unstructured content at scale.
Running the same pipelines across dev/uat/prod with minimal changes.
Enforcing naming conventions, repo structure, and release governance.
Capturing ingestion metrics and errors for auditability and RCA.

Solution Architecture

Fabric architecture: parent pipeline orchestration, dynamic params, library variables, medallion layers, logging, Git-based promotion — Ingesting data from diverse sources into OneLake (Bronze, Silver, Gold layers) using notebooks, Spark jobs, and dataflows, enabling downstream consumption via Power BI, SQL endpoints, and AI-driven insights

Core Components

Medallion Data Flow

Bronze: Raw ZIPs/datasets persisted with minimal transformation.
Silver: Unzip, metadata extraction, parsing (DOCX/PDF/TXT), cleaning/dedup.
Gold: Curated datasets, KPIs, and semantic artifacts for reporting.

Analytics & Refresh

Power BI semantic model refresh after Silver/Gold completion to keep reports up to date.
Semantic layer configuration ensures consistent business metrics across reports.
Incremental refresh patterns to optimize data loading and report performance.

AI Integration

The AI Crawler integration is particularly valuable as it:

Indexes processed document chunks with vector embeddings
Provides natural language search across all standards documentation
Creates semantic connections between related standards
Enables knowledge discovery through shortcuts to related content
Supports Q&A interactions using the standards knowledge base

Implementation Process

Phase 1: Foundations

Establish naming standards (tables, notebooks, schemas, pipelines), repo layout, and project README template.
Define common utility functions and notebook templates for team reuse.
Create version-controlled variable libraries for Dev/UAT/Prod environments.

Naming Conventions

Our standardized naming conventions ensure clarity, discoverability, and proper governance:

Comprehensive Naming Convention Guide ▼

This section outlines the standard naming conventions used in our Microsoft Fabric workspace. These conventions promote consistency, readability, and clarity across teams and projects.

General patterns

Type	Pattern Summary
_Folders	_{Use Pascal Case}
_{Fabric objects}	_{Use lower case with underscores(_) and meaningful prefixes}
_{Constants/Parameters}	_{Use UPPER CASE with underscores(_) and meaningful prefixes}

Detailed naming conventions

Category	Type	Description	Prefix	Examples
_{Data structures}	_Table	_{Physical data tables}	`t_`	`t_sales_orders`
	_View	_{Logical representations of data}	`v_`	`v_customer_summary`
	_Schema	_{Logical grouping of objects per project & layer}	`schema_`, `domain_`	`schema_finance`, `domain_marketing`
_{Code artifacts}	_Notebook	_{Used for ETL, layer logic, or utility logic}	`nb_`	`nb_br_finance`, `nb_sl_marketing`, `nb_utils_sql`, `nb_dq_sales`
	_{SQL script}	_{SQL transformation or analysis logic}	`sql_`	`sql_dq_customers`
	_{Stored procedure}	_{Scripted data transformation logic}	`sp_`	`sp_load_sales_data`
	_{User-defined function}	_{Reusable logic as a function}	`udf_`	`udf_calculate_discount`
_Execution	_{Data pipeline}	_{Orchestrates execution flow, movement, and dependencies}	`dp_`	`dp_br_customers`, `dp_init_integration`
	_Dataflow	_{Performs in-pipeline data transformations visually}	`df_`	`df_customer_data_cleaning`
	_{ML Model}	_{Machine learning models per project/function}	`ml_`	`ml_customer_churn_prediction`
_Storage	_Lakehouse	_{Structured/unstructured storage by team & layer}	`lh_`	`lh_finance_bronze`, `lh_finance_silver`
	_Warehouse	_{SQL-based structured data store}	`wh_`	`wh_sales_bronze`, `wh_sales_silver`
	_Eventhouse	_{Event or streaming data store}	`eh_`	`eh_events_bronze`, `eh_events_silver`
_{Support components}	_Environment	_{Environment specific libraries}	`env_`	`env_common`, `env_dev`
	_{Variable library}	_{Central variable definitions}	`vl_`	`vl_project_config`, `vl_common`
_Reporting	_{Power BI report}	_{Business intelligence visualizations}	`pbi_`	`pbi_sales_dashboard`
_DevOps	_{Feature branch}	_{Used for development. Merges into uat.}	`features/<feature_id>_<project>_<functionality>`	`features/1234_sales_report_export`
	_{Hotfix branch}	_{Used for quick fixes. Merges into uat.}	`hotfix/<bug_id>_<project>_<fix>`	`hotfix/5678_customer_data_connection`

▲ Click anywhere above to collapse

Repository & Folder Structure

Fabric Project Framework: Data Platform, Lakehouses, Common, Projects — Fabric Project Framework showing Data Platform, Lakehouses (Bronze/Silver/Gold), Common assets, and Projects with Notebooks, Pipelines, Reports, and Variable Libraries.

Phase 2: Orchestration & Notebooks

Build parent orchestrator and child pipelines by layer.
Implement modular notebooks with clear sections (params, lakehouse link, variables, imports, utils).
Design parameterized notebook templates with environment-aware configuration.

Pipeline architecture showing parent-child relationship between dp_project_main orchestrator and child pipelines — Parent-child pipeline flow with main orchestrator controlling child pipelines and their associated notebooks for data processing

The diagram illustrates our hierarchical pipeline architecture with distinct layers and components:

Layer	Component	Description
Top-level orchestration	`dp_project_main`	Central entry point that coordinates the entire data processing workflow
Processing layer	`dp_project_processor`	Manages all data transformation tasks
	`dp_project_notifications`	Handles alerting and monitoring
Execution notebooks	`nb_init_project`	Sets up required tables, configurations, and environment validation
	`nb_br_project`	Processes data at the Bronze layer (raw ingestion)
	`nb_sl_project`	Transforms data at the Silver layer (structured data)
	`pbi_project`	Refreshes analytical models and Power BI datasets
Utility layer	`nb_utils_project_sql`	Contains common SQL operations and queries
	`nb_utils_project_functions`	Houses reusable Python functions for processing

This architecture enables clean separation of concerns while maintaining centralized orchestration. Pipeline runs can be monitored holistically through the main pipeline while allowing targeted troubleshooting of specific data processing stages.

Phase 3: Medallion & Logging

Land raw data/ZIPs in Bronze, unzip & parse to Silver, curate to Gold.
Emit ingestion metrics and error logs to warehouse tables for run telemetry.
Implement semantic model refresh triggers via Functions.

Medallion Architecture

Our implementation uses a structured approach to data organization across the three medallion layers:

Detailed Lakehouse Structure ▼

Bronze Lakehouse

The Bronze layer stores raw, unmodified data as it's ingested from source systems.

Table/Storage	Description
`project.raw_documents`	Raw document metadata before processing.
`project.source_metadata`	Source system metadata about content origin.
`/ext/data/<project_id>/<source_id>/`	File storage location for ZIP archives and raw data.

Silver Lakehouse

The Silver layer contains parsed, cleaned, and standardized data ready for analysis.

Table/Storage	Description
`project.processed_documents`	Document text and metadata after extraction and parsing.
`project.document_chunks`	Document content split into processable chunks.
`project.document_entities`	Named entities extracted from document content.
`/ext/data/<project_id>/processed/`	Extracted and parsed documents from ZIPs.

Gold Lakehouse

The Gold layer provides business-ready, curated datasets optimized for reporting.

Table/Storage	Description
`project.document_analytics`	Curated document metrics and KPIs for reporting.
`project.project_metrics`	Project-level aggregated metrics and trends.

▲ Click anywhere above to collapse

Phase 4: Releases & Monitoring

Adopt Git-based deployments: feature → UAT → Prod.
Wire Teams notifications on success/failure with contextual run info.
Trigger Power BI refresh post-pipeline.
Deploy operational dashboards for monitoring run status and metrics.

Logging & Observability

We maintain two central warehouse tables for comprehensive operational monitoring:

Table Name	Purpose	Key Fields	Benefits
`metrics.ingestion_metrics`	Track successful ingestion events	`source_system`, `source_item`, `source_modified_date`, `target_item`, `load_count`, `load_datetime`	Historical trends, volume monitoring
`metrics.error_logs`	Capture pipeline failures	`source_system`, `resource_name`, `operation`, `error_message`, `error_datetime`	Failure patterns, RCA

These centralized metrics enable run analytics, SLA tracking, and rapid root-cause analysis across all pipelines.

Metrics Collection & Visualization

Our approach to observability combines standard Fabric metrics with the powerful Fabric Capacity Metrics App to gain deep insights into performance and resource utilization:

Pipeline-level metrics: Duration, success rate, failure points (via metrics.ingestion_metrics).
Asset-level metrics: Document counts, parsing success rates, file size distributions.
Capacity utilization monitoring: Tracking compute usage by artifact type, operation, and time period.
Throttling and bottleneck identification: Detecting and resolving performance constraints.

Microsoft Fabric Capacity Metrics App showing compute usage, throttling, and detailed timepoint analysis — Microsoft Fabric Capacity Metrics App providing comprehensive monitoring of compute usage, throttling incidents, and detailed operation analysis across the platform

The Fabric Capacity Metrics App offers invaluable insights into our capacity utilization, allowing us to:

Identify which artifact types and specific items are consuming the most Capacity Units (CUs)
Monitor capacity utilization trends over time with the "CU over time" chart
Analyze interactive vs. background process distribution
Pinpoint specific operations driving usage through the Timepoint Details page
Track OneLake storage consumption across workspaces

This level of visibility enables proactive capacity management, accurate resource allocation, and ensures optimal performance across our Fabric environment. For major processing jobs, we conduct pre and post-run analysis to fine-tune our resource utilization and prevent throttling during peak periods.

DevOps Process

To maintain consistency and quality across our multi-environment setup, we implemented a rigorous yet agile DevOps approach based on Microsoft Fabric's Git-based deployments (Option 1).

Microsoft Fabric Git-based Deployment Strategy

Our implementation follows Microsoft's recommended Git-based deployment approach, where:

All deployments originate directly from the Git repository
Each stage in our release pipeline has a dedicated primary branch (dev, uat, main)
Each branch feeds the appropriate workspace in Fabric
Changes flow through environments using Pull Requests with appropriate approvals

DevOps workflow showing feature branches, PR process, and promotion through UAT to Production — Git-based promotion workflow with feature branches, pull requests, and controlled environment promotion

Why We Use Git-based Deployments (Option 1)

Our implementation follows Microsoft Fabric's Option 1 (Git-based deployments) for several key advantages:

Single Source of Truth: Git serves as the definitive source of all deployments, ensuring complete version control and history
Gitflow Compatibility: Our team follows a Gitflow branching strategy with multiple primary branches (dev, uat, main), which aligns perfectly with this approach
Simplified Deployments: Direct uploads from repo to workspace streamline the deployment process
Clear Branch-to-Environment Mapping: Each environment corresponds to a specific Git branch, making it easy to track what code is deployed where
Automated Workspace Sync: Changes to protected branches automatically trigger workspace updates through Fabric Git APIs

For more details, see the official Microsoft Fabric CI/CD documentation: Manage deployment with CI/CD in Microsoft Fabric

Branch Protection

Feature branches: All development begins in feature branches (e.g., feature/add-new-standard).
PR reviews: Required code reviews from 2+ team members with automated quality checks.
Controlled promotion: Feature → UAT → Production with automated validation at each step.
Protected branches: Direct commits to uat and main branches are prohibited.

Git Branching Strategy

This branching strategy ensures that all code changes follow a consistent path from development through testing and finally to production. Feature branches provide isolation for development work, while protected branches maintain the stability of our UAT and Production environments. The automated workspace sync ensures that our Fabric workspaces always reflect the current state of their corresponding branches.