The Engineering Playbook for Multi-Model AI: MCP, AI Gateways, and How to Architect for the Day Your Vendor Pivots

Q: Did Microsoft cancelling Claude Code make Claude unavailable?

No. The Claude *Code tool* was cancelled. Anthropic's Claude models (Opus 4.6, 4.7, Sonnet 4.6, Haiku 4.5) remain available through GitHub Copilot CLI via the `--model` flag and through Microsoft Foundry. The intelligence stays accessible; only the interface and the agent workflow pattern change.

Multi-model AI engineering playbook 2026: MCP protocol, AI gateways (LiteLLM/Portkey/Foundry), vendor-portable architecture patterns, migration from Claude Code to GitHub Copilot CLI.

Errin O'Connor

CEO & Chief AI Architect

•

May 15, 2026

•

18 min read

Multi-Model AIMCPModel Context ProtocolAI GatewayLiteLLMPortkeyMicrosoft FoundryClaude CodeGitHub Copilot CLIEnterprise Architecture

The Engineering Playbook for Multi-Model AI: MCP, AI Gateways, and How to Architect for the Day Your Vendor Pivots

18 min readPublished May 15, 2026

Key Takeaways

Multi-model AI engineering playbook 2026: MCP protocol, AI gateways (LiteLLM/Portkey/Foundry), vendor-portable architecture patterns, migration from Claude Code to GitHub Copilot CLI.

TL;DR

Microsoft's May 14 2026 decision to cancel internal Claude Code licenses by June 30 is the public-facing example of a quiet truth in enterprise AI: the tool layer changes faster than the model layer. The companion CIO-level blog covered the strategic case. This one is the engineering playbook.
The Model Context Protocol (MCP), now at over 10,000 active public servers and 97 million monthly SDK downloads (Anthropic, March 2026), is the closest thing the industry has to a vendor-neutral standard for connecting AI agents to tools, data, and services. MCP is the foundation of vendor-portable AI engineering.
The AI gateway layer — LiteLLM, Portkey, OpenRouter, Microsoft Foundry, or self-hosted equivalents — is where multi-model strategy goes from "good intention" to "operational reality." A gateway abstracts the provider-specific API surface, routes requests to the right model per-use-case, handles failover, and gives you the single audit log compliance demands.
GitHub Copilot CLI vs Claude Code is not a binary choice for most enterprises. Copilot is built for moment-to-moment developer flow (inline completions, 32-128k context, Microsoft toolchain integration). Claude Code is built for delegated autonomous work (up to 1M tokens of context, multi-file refactoring, SWE-bench Verified 87.6% on Opus 4.7 GA). The right enterprise pattern uses both — and architects the application code so the underlying model can change without rewrites.
This guide details the technical architecture, the migration patterns, the code-level abstractions, and the EPC Group implementation framework for engineering teams who need to make multi-model strategy real.

Executive Summary

If your enterprise's AI strategy is built around a specific tool — Claude Code, Cursor, Codeium, ChatGPT for Enterprise, anything — you are one vendor decision away from a six-to-twelve-month tooling disruption. Microsoft just demonstrated that publicly with the Claude Code cancellation.

If your enterprise's AI strategy is built around a specific model — Anthropic Claude Sonnet 4.6 as the only LLM behind every agent — you are one model deprecation, one price increase, or one safety-policy update away from a different kind of disruption.

The engineering answer is vendor-portable architecture: write the application code against a stable abstraction layer, route requests through a gateway that can swap models, and use the Model Context Protocol for tool/data integration so the agent surface stays consistent across vendors.

This guide details how to actually build that. We cover the gateway pattern, the MCP protocol, the specific GitHub Copilot CLI vs Claude Code engineering comparison, the migration playbook, code-level abstraction patterns, regulated-industry overlays, and the EPC Group implementation framework refined across enterprise AI engineering engagements.

GitHub Copilot CLI vs Claude Code: The Technical Comparison

The first practical question for any engineering team affected by Microsoft's Claude Code cancellation (or for any team weighing the two): are these tools actually equivalent? They are not. They overlap, but they are optimized for different workflows.

Feature comparison matrix

Dimension	GitHub Copilot CLI	Claude Code
Primary workflow	Augment developer flow	Delegate autonomous work
Inline completions	Yes (native IDE integration)	No
Maximum context	32k–128k tokens (model-dependent)	Up to 1M tokens (Sonnet/Opus)
Repository awareness	File-level + recent context	Full repository, multi-file, call-chain reasoning
Multi-file refactoring	Limited (agent mode developing)	Native, primary use case
Model selection	Anthropic Claude (Opus 4.6, Sonnet 4.6, Haiku 4.5), OpenAI GPT family, others via `--model`	Anthropic Claude family
Agentic depth	Inline + specialized agents + background delegation	Deep autonomous; plans multi-step, returns diffs
SWE-bench Verified score	Varies by underlying model	87.6% on Opus 4.7 GA (April 16, 2026) — highest published
Pricing — Free tier	$0 (2,000 completions + 50 premium requests)	Limited Pro at $20/mo
Pricing — Pro tier	$10/mo	$100/mo (Max 5x)
GitHub integration	Native	Via MCP
Microsoft toolchain integration	Native (Azure DevOps, GitHub, Teams)	Via MCP
Terminal-first UX	Available in Copilot CLI	Primary mode

The right pattern for most enterprises

For most professional engineering teams the answer is both, not either.

Copilot Pro at $10/month covers inline completions, quick agent tasks, GitHub PR review integration, and Microsoft toolchain workflows. It is the moment-to-moment AI tool that disappears into the IDE.
Claude Code Max at $100/month (or Anthropic API access through Microsoft Foundry) handles the 20% of tasks that are complex enough to justify an autonomous agent with deep codebase understanding — large refactors, cross-cutting feature branches, architecture exploration, debugging tangled call chains.

For Microsoft-cancelled-Claude-Code teams specifically, the transition is not "lose Claude capability." Claude Opus 4.6 and 4.7 remain in Copilot CLI via the --model flag. The transition is "lose the Claude Code interface and adopt Copilot CLI's interface." The model intelligence stays; the cockpit changes.

The MCP Layer: Why It Matters for Vendor Portability

The Model Context Protocol, originally introduced by Anthropic in late 2024, has become the closest thing to a vendor-neutral standard for connecting AI agents to tools, data, and services. By March 2026, all major providers had adopted it; Anthropic reported over 10,000 active public MCP servers and 97 million monthly SDK downloads across Python and TypeScript.

What MCP actually solves

Before MCP, every AI agent had to integrate with every tool through that tool's specific API, that vendor's specific function-calling format, and that agent runtime's specific orchestration pattern. A GitHub integration written for Claude Code did not transfer to Cursor; a Slack integration written for ChatGPT did not transfer to Copilot. The result was N×M integration surface area for N agents × M tools.

MCP collapses this to N+M. Each tool exposes itself as an MCP server using a standard JSON-RPC 2.0 protocol. Each agent acts as an MCP client. Any compliant client can connect to any compliant server.

Architecture: client-server-host

The MCP architecture has three components:

Host — the AI application the end user interacts with (Claude Desktop, Copilot CLI, custom agent application).
Client — a stateful connection within the host, one per MCP server. The client handles the JSON-RPC channel.
Server — the tool, data source, or service that exposes itself via MCP. Examples: a GitHub MCP server, a Postgres MCP server, a SharePoint MCP server, an internal-knowledge-base MCP server.

A host creates multiple isolated client sessions, each maintaining its own JSON-RPC channel with its own MCP server. Tool calls, resource access, and prompts flow over those channels using a standard message format.

Why this matters for vendor portability

If your agent application is built against MCP rather than against a specific vendor's tool-calling format, the agent can be swapped from Claude Code to Copilot CLI to a custom OpenAI-backed agent with zero rework on the tool integration layer. The agent's intelligence model changes; the tools it can call do not.

This is the architectural pattern that makes Microsoft's Claude Code cancellation a non-event from a tools perspective for teams that built on MCP. The tool integrations move with the team.

The 2026 MCP roadmap

The MCP roadmap published in early 2026 introduces several changes that matter for enterprise adoption:

Streamable HTTP transport (November 2025 spec) replaces legacy Server-Sent Events. MCP servers can now run as remote services rather than local processes, enabling stateless deployment behind load balancers.
Stateless server operation in H2 2026 will allow MCP servers to scale across multiple instances with session migration handled transparently.
MCP Server Cards will provide automatic discovery — agents can find and connect to compatible MCP servers without manual configuration.
Agent-to-Agent (A2A) coordination will mature MCP from single-agent tool connections into the infrastructure for multi-agent orchestration.

For enterprise architects, this means MCP is moving from "a nice protocol" to "the substrate for AI tool integration." Building against it now produces architecture that survives the next several vendor pivots.

The AI Gateway Layer: Where Multi-Model Becomes Real

If MCP is the integration substrate, the AI gateway is the routing substrate. A gateway sits between your application code and the underlying model providers, providing a unified API surface, request routing, failover, observability, and (in the enterprise tier) governance and compliance controls.

The four production gateway options

LiteLLM — open source and self-hosted. Best for engineering teams with resources to operate their own infrastructure and who need auditable routing logic with no external dependencies. Supports 100+ models across providers. Used heavily by teams that need to keep all routing logic on their own infrastructure for compliance reasons.

Portkey — enterprise-grade managed gateway with strong governance posture. PII filtering, content policies, guardrails, and observability at the gateway layer. Best for regulated industries where compliance controls must be enforced before requests reach providers. Used heavily in healthcare, financial services, and federal contexts.

OpenRouter — marketplace-style proxy with unified API access to 300+ models from 60+ providers. Best for prototyping, rapid experimentation, and teams that want the broadest model catalog without operating their own infrastructure. Less suitable for enterprise compliance scenarios because the underlying routing happens through a third-party.

Microsoft Foundry — Microsoft's own AI gateway, with the broadest enterprise model catalog (OpenAI, Anthropic, Cohere, DeepSeek, Mistral, Meta, Microsoft's own models). Best for Microsoft-centric enterprises because it integrates with the rest of Microsoft Foundry's enterprise controls (Azure AD identity, Microsoft Purview labels, Microsoft Sentinel routing, Azure-native networking).

Selection decision framework

For most EPC Group clients, the decision is:

Workload	Recommended gateway
Microsoft-centric enterprise, broad model catalog, integrated governance	Microsoft Foundry
Regulated industry with content-policy enforcement at gateway	Portkey
Engineering teams self-hosting all routing logic (data sovereignty)	LiteLLM
Rapid prototyping or non-production exploration	OpenRouter
Multi-cloud / multi-vendor with no preference	LiteLLM (open) or Portkey (managed)

For most enterprises, two-gateway architectures are common: Microsoft Foundry as the primary production gateway (because the model catalog is broadest and Microsoft integrations are deepest), with LiteLLM or Portkey as a secondary surface for use cases that require capabilities the primary gateway does not provide.

What a gateway gives you in practice

Single API surface for application code. The application calls gateway.invoke(use_case='legal_review', prompt=...) and the gateway decides which model handles the request. The application does not know whether the underlying call went to Claude Sonnet on Foundry, GPT-4 on Azure OpenAI, or a self-hosted Llama instance.
Per-use-case model routing. Different use cases route to different models based on the routing rules. Legal review goes to Claude Sonnet 4.6 for the long context. Customer-service summarization goes to Claude Haiku 4.5 for cost. Code generation goes to Claude Opus 4.7 via Copilot CLI for the SWE-bench performance.
Failover. If the primary model is unavailable (rate-limited, deprecated, contractually unavailable), the gateway routes to the configured fallback. Enterprises with tested failover resolve outages four times faster than those without.
Cost observability. Every request is logged with its cost. A single dashboard answers "how much did we spend on AI last month, by use case, by model, by team."
Compliance posture. PII filtering, content policy enforcement, audit logging, and sensitivity-label propagation happen at the gateway layer. Compliance evidence is produced once for all model usage rather than per-provider.
A/B routing. New models can be tested in production traffic by routing a percentage of requests to them and comparing outcomes. The application code does not change.

The Vendor-Portable Application Pattern

The implementation pattern that makes multi-model AI engineering survive vendor pivots:

Layer 1: Use case-named invocation

Application code invokes the gateway by use case name, not by model name.

# Wrong — vendor-locked
response = anthropic.messages.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": query}],
    max_tokens=4000,
)

# Right — use-case-named, vendor-agnostic
response = ai_gateway.invoke(
    use_case="legal_contract_review",
    messages=[{"role": "user", "content": query}],
    max_tokens=4000,
)

The gateway holds the configuration that says "legal_contract_review routes to Claude Sonnet 4.6 with these parameters." When the model decision changes (a new release, a vendor pivot, a contract renegotiation), only the gateway configuration changes. Application code does not.

Layer 2: MCP for tool/data integration

Every tool and data source the agent needs is exposed through an MCP server. The agent's tool surface is defined in MCP, not in vendor-specific function-call schemas.

This means: when the underlying model changes (e.g., Claude Sonnet 4.6 → GPT-5), the tool definitions, the tool authentication, the tool authorization, and the tool audit trail all stay the same. The model swaps; the agent stays connected to the same SharePoint, Pipedrive, internal knowledge base, and operational systems.

Layer 3: Sensitivity-label propagation

Every request to the gateway carries the sensitivity classification of its input data. Microsoft Purview sensitivity labels propagate from the data layer through to the model invocation. The gateway enforces label-based policy: Highly Confidential content cannot be processed by external models; Confidential content is logged with full audit detail; Public content can use the cost-optimized model.

This is the layer that makes regulated-industry deployments survive a vendor change. The compliance posture is in the gateway, not in the vendor-specific call.

Layer 4: Observability and audit

Every gateway request produces:

Identity (which user, which application, which use case).
Timestamp.
Model invoked + version.
Sensitivity label context.
Input token count and content classification.
Output token count and content classification.
Cost.
Outcome (success / fallback used / failure).

These events flow to Microsoft Sentinel (or the customer's SIEM) for security operations and to the cost-management surface for FinOps. Single source of truth across all models.

The Migration Playbook: Claude Code to GitHub Copilot CLI

For engineering teams in Microsoft's affected groups — and for any team facing an equivalent forced tooling migration — the playbook EPC Group recommends:

Week 1–2: Inventory and assessment

Document every workflow currently using Claude Code: feature development, refactoring, debugging, code review, documentation generation.
Identify Claude Code-specific capabilities the team has built dependencies on: deep repository awareness, long-context analysis, multi-file refactoring, autonomous agent workflows.
Map each workflow to its Copilot CLI equivalent and identify gaps.

Week 3–4: Copilot CLI setup with Claude models retained

Install Copilot CLI across the affected developer population.
Configure --model to default to claude-sonnet-4-6 (or claude-opus-4-7 for the autonomous agent workflows). Anthropic models remain available via Copilot CLI — the interface changes, not the model.
Identify use cases that genuinely require Claude Code's deeper agent surface and budget for Claude Code Max licenses outside the corporate cancellation for those engineers.

Week 5–8: Workflow migration

Migrate each workflow from Claude Code to Copilot CLI with the team that owns it. Document the new pattern, the new keyboard shortcuts, the new prompt patterns.
Training sessions for engineers on Copilot CLI agent patterns, GitHub integration, and multi-model flag usage.
Update internal documentation and runbooks.

Week 9–12: Stabilization and evaluation

Track productivity metrics through the transition (acceptance rate of completions, time-to-resolution for complex tasks, developer NPS).
Identify any workflows that genuinely broke and need Claude Code Max remediation.
Evaluate whether the team should also stand up an MCP-based agent infrastructure for the autonomous workflows that no inline tool replaces.

Special note for Microsoft-affected teams

The cancellation is a tool decision, not a model decision. Engineers losing Claude Code today still have access to Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 (and presumably Opus 4.7 + Sonnet 4.7 as they release) through Copilot CLI's --model flag. The intelligence stays. The workflow pattern is what shifts.

Regulated-Industry Overlay

For healthcare, financial services, federal, and defense-contractor environments, the multi-model engineering pattern adds compliance overlays:

Healthcare (HIPAA)

All gateway traffic involving PHI routes only through models covered by the Microsoft Business Associate Agreement (Microsoft Foundry models qualify; some third-party endpoints do not).
Sensitivity labels propagate to gateway and gateway-level policy blocks PHI from going to non-BAA endpoints.
Audit logging routes to Microsoft Sentinel with HIPAA-aligned analytic rules.

Financial Services (SR 11-7 + SOC 2 + SOX)

Model risk inventory includes every model invoked through the gateway, not just the "official" production model.
Change control on gateway routing rules (a routing change is a model change in SR 11-7 terms).
Audit logging retention aligned to SOX (7 years) and SR 11-7 (model lifetime + post-retirement).

Federal (FedRAMP)

Gateway and underlying model endpoints must be in the appropriate Azure Government tier (GCC or GCC High).
Cross-vendor routing requires verification that every endpoint is in the same FedRAMP authorization boundary.
NIST 800-53 control mapping extends to gateway-layer controls.

Defense (CMMC)

Similar to FedRAMP but with the additional Controlled Unclassified Information (CUI) handling requirements per NIST 800-171.

EPC Group's Multi-Model AI Engineering Framework

For enterprises building or refactoring AI applications for multi-model resilience, the EPC Group standard pattern:

Weeks 1–4: Architecture assessment

Current-state inventory of AI application surface area.
Current-state vendor concentration analysis.
Use case enumeration with sensitivity classification.
Compliance framework scoping.

Weeks 5–8: Gateway selection and standup

Gateway selection decision (Microsoft Foundry / Portkey / LiteLLM / hybrid).
Gateway infrastructure provisioning.
Sensitivity-label propagation configuration.
Audit log routing to Microsoft Sentinel.

Weeks 9–14: Application abstraction layer

Application code refactored to use-case-named gateway invocation.
MCP server inventory and standardization for tool/data integration.
Failover routing rules configured per use case.
A/B routing infrastructure ready for production model evaluation.

Weeks 15–18: Testing and stabilization

Failover validation against representative tasks for every use case.
Compliance evidence packaging.
Performance and cost baseline establishment.
Operational runbook documentation.

Weeks 19–22: Production cutover

Phased migration of production traffic through the new gateway.
Observability dashboards for cost, performance, compliance.
Vendor agreement reviews informed by the now-credible alternative.

Weeks 23–26: Handover and steady-state

Operations team training on gateway management.
Quarterly review cadence established.
Center-of-Excellence stand-up if not already present.

The 26-week timeline is for substantial Fortune 500 implementations. Smaller scope engagements run shorter. Greenfield AI applications can adopt the pattern from day one with materially less retrofit work.

Common Engineering Pitfalls

Across the multi-model engineering engagements EPC Group has led:

Building against a vendor's function-calling format instead of MCP. Locks tool integrations to a specific agent runtime. Refactor unavoidable when the runtime changes.
Treating the gateway as optional. Direct-to-provider API calls scattered through the codebase mean every vendor change requires application-level refactoring.
Configuring failover but never testing it. Failover that has never run in production fails when it is needed. Schedule quarterly failover drills.
Mixing gateway and direct calls. Some teams adopt a gateway for new code but leave legacy direct calls. The audit log fragments. The compliance evidence breaks. Migrate everything or commit to a dated cutover.
Skipping the sensitivity-label propagation work. Without it, the gateway cannot enforce label-based policy. Compliance posture has a hole the team will discover during the next audit.
Choosing the wrong gateway for the use case. OpenRouter is wonderful for prototyping; it is the wrong choice for healthcare PHI workloads. Portkey is excellent for content-policy enforcement; it is unnecessary overhead for engineering-only workloads.
Forgetting cost observability until the bill arrives. Gateway-level cost dashboards should exist from day one. The first month of production usage is where the cost surprises happen.

Frequently Asked Questions

What is the Model Context Protocol (MCP)?

MCP is an open protocol introduced by Anthropic in late 2024 that defines how AI agents connect to tools, data sources, and services. It uses JSON-RPC 2.0 over Streamable HTTP transport (as of November 2025 spec). By March 2026, all major AI providers had adopted MCP; over 10,000 active public MCP servers and 97 million monthly SDK downloads were reported. MCP is the closest the industry has to a vendor-neutral standard for AI tool integration.

Why does MCP matter for enterprise architecture?

MCP collapses the N×M integration problem (N agents × M tools) into N+M. Every tool exposes itself once as an MCP server; any MCP-compatible agent (Claude Desktop, Copilot CLI, custom applications) can consume the tool. Tool integrations become vendor-portable. The agent's intelligence model can change without the tool integration layer changing.

What is an AI gateway?

An AI gateway is the infrastructure layer that sits between application code and AI model providers. It provides a unified API surface, request routing, failover, observability, cost management, and (in enterprise gateways) governance and compliance controls. Production options include LiteLLM (open source self-hosted), Portkey (managed enterprise), OpenRouter (marketplace-style proxy), and Microsoft Foundry (Microsoft's own managed gateway).

Which gateway should my enterprise use?

For Microsoft-centric enterprises with regulated-industry workloads, Microsoft Foundry as the primary plus optionally Portkey or LiteLLM for use cases needing additional governance. For non-Microsoft enterprises, the decision depends on compliance posture, self-hosting preferences, and model catalog breadth. Two-gateway architectures (primary + secondary) are common at enterprise scale.

How is GitHub Copilot CLI different from Claude Code technically?

Copilot CLI is optimized for moment-to-moment developer flow with inline completions, 32k-128k context, and native integration into Microsoft toolchain (GitHub, Azure DevOps, Teams). Claude Code is optimized for delegated autonomous work with up to 1M tokens of context, multi-file refactoring, and deep repository reasoning. They overlap but are not interchangeable. Most enterprises benefit from both.

Did Microsoft cancelling Claude Code make Claude unavailable?

No. The Claude Code tool was cancelled. Anthropic's Claude models (Opus 4.6, 4.7, Sonnet 4.6, Haiku 4.5) remain available through GitHub Copilot CLI via the --model flag and through Microsoft Foundry. The intelligence stays accessible; only the interface and the agent workflow pattern change.

What is the cost implication of multi-model architecture?

Gateway overhead is typically modest (sub-cent per request at most managed gateways). The real cost lever is per-use-case model selection — routing simpler use cases to cheaper models (Haiku, GPT-4 family) and reserving the high-capability models (Opus, GPT-5 family) for cases that need them. EPC Group clients typically see 30-60% cost reductions from intentional model routing versus single-provider strategies.

How does failover work in practice?

Failover requires three components: (1) the secondary model configured per use case, (2) the failover trigger logic (rate limit hit, timeout exceeded, error code returned, contractual unavailability), and (3) the validation pipeline that confirms the secondary model produces acceptable outputs for the use case. Component 3 is the one that gets skipped most often and the one that determines whether failover works when needed.

How does this interact with Microsoft Foundry?

Microsoft Foundry is itself an AI gateway with the broadest enterprise model catalog. For most Microsoft-centric enterprises, Foundry is the right primary gateway. Multi-model architecture on Foundry means using Foundry's catalog deliberately (different model families for different use cases) plus maintaining a documented exit path in case Foundry's commercial terms change materially. The exit path keeps negotiating leverage where it belongs.

What does the EPC Group migration playbook produce?

For a Fortune 500 engineering organization moving from a single-vendor AI deployment to multi-model architecture, the typical deliverables: gateway infrastructure (production-grade), application code refactored to use-case-named invocation, MCP server inventory for tool integrations, failover routing rules per use case validated against representative tasks, sensitivity-label propagation, audit log routing to Microsoft Sentinel, compliance evidence packaging, and a 12-month operating runbook. Standard engagement is 26 weeks.

How does this apply to regulated industries?

For healthcare (HIPAA), financial services (SR 11-7 + SOC 2 + SOX), federal (FedRAMP), and defense (CMMC), the multi-model engineering pattern adds explicit compliance overlays: BAA verification per endpoint, model risk inventory including gateway routing rules, audit log retention aligned to framework requirements, NIST 800-53 control mapping. EPC Group's framework integrates these overlays into the gateway-layer policy rather than retrofitting them at the application layer.

What is the typical engineering team size for this work?

For a Fortune 500 enterprise implementation, the engineering team includes: 1 senior architect (named in SOW), 2-4 platform engineers for gateway infrastructure, 1-2 application engineers per affected application for the refactor, 1 compliance liaison for regulated-industry overlays, and 1 program manager. EPC Group's typical engagement provides the senior architect and platform engineers; the customer typically provides the application engineers (who know the application best) and the compliance liaison.

Does the pattern work with Bring-Your-Own-Model deployments?

Yes. Open-weights models (Meta Llama, Mistral, DeepSeek, Cohere Command R) can be self-hosted or accessed through gateways. The gateway abstraction handles them the same way as managed-provider models. Use cases requiring sovereign data handling (on-premises model invocation, air-gapped deployments) benefit particularly from the gateway pattern because the model location is opaque to application code.

How does MCP differ from OpenAI's function calling or Anthropic's tool use?

OpenAI function calling and Anthropic tool use are vendor-specific formats for telling a model what tools exist and how to invoke them. MCP is a vendor-neutral protocol for the same concept. Application code written against MCP works with any MCP-compatible client/host (Claude Desktop, Copilot CLI, custom agents). Application code written against function calling or tool use is locked to that specific provider.

What is the realistic timeline to multi-model production?

For a single application: 12-16 weeks from architecture decision to multi-model production traffic. For a Fortune 500 enterprise's broader AI portfolio: 26 weeks for the first wave, with subsequent applications following the established pattern in 6-8 weeks each. The 26-week first-wave timeline includes the gateway infrastructure, the MCP server inventory, the compliance overlay, and the operational handover that subsequent applications inherit.

About EPC Group

EPC Group is a 29-year Microsoft consulting firm serving Fortune 500 companies, federal agencies, healthcare systems, financial institutions, government, manufacturing, energy, education, retail, technology, and global enterprises. The firm has delivered more than 11,000 Microsoft implementations including 6,500-plus SharePoint deployments, 1,500-plus Power BI implementations, and 500-plus Microsoft Fabric engagements.

EPC Group is Microsoft Solutions Partner with the core designations across the Microsoft AI Cloud Partner Program. The firm was historically the oldest continuous Microsoft Gold Partner in North America from 2016 until the program's retirement, and is a five-time G2 Leader in Business Intelligence Consulting with a perfect 100 Net Promoter Score (Spring 2026).

Founder Errin O'Connor is a four-time Microsoft Press best-selling author, former NASA Lead Architect, and a member of the Microsoft SharePoint Project Tahoe and Microsoft Power BI Project Crescent beta teams.

Next Steps

If your engineering organization is rebuilding around multi-model AI architecture — whether prompted by the Microsoft Claude Code cancellation, by a vendor pricing change, or by a strategic decision to avoid future vendor disruption:

Inventory your AI vendor concentration and tooling dependencies.
Assess your application architecture against the vendor-portable pattern (use-case-named gateway invocation, MCP for tool integration, sensitivity-label propagation, single audit log).
Select the appropriate AI gateway for your environment.
Run failover validation on your top three AI use cases to determine whether your secondary-model strategy is real or theoretical.
Engage a partner with deep multi-model AI engineering experience to compress the planning timeline.

To discuss multi-model AI engineering with EPC Group's senior architects, contact us or call (888) 381-9725.

Share this article:

Errin O'Connor

CEO & Chief AI Architect

Microsoft Press bestselling author with 29 years of enterprise consulting experience.

View Full Profile

AI Strategy

Microsoft Copilot Change Management Practice: Prosci ADKAR + Quarterly Scorecard (2026)

EPC Group's Change Management Practice for Microsoft Copilot Adoption. Prosci-certified senior consultants, persona-based use case libraries, executive coaching cadence, Quarterly Copilot Adoption Scorecard with 7 measured KPIs. Closes the 60-85% vs 15-25% adoption gap.

AI Strategy

Microsoft 365 Copilot Readiness Checklist: The CIO Edition (2026)

Every consulting firm publishes a Copilot readiness checklist. Most miss the CIO-specific concerns. EPC Group's CIO edition: 35-item checklist across budget, governance, change management, talent, board reporting.

AI Strategy

Agentic Factory: Microsoft Foundry + Agent 365 + Defender for IoT for Manufacturing (2026)

Accenture + Avanade announced Agentic Factory with Microsoft. EPC Group breakdown: how the Microsoft stack (Foundry + Agent 365 + Defender for IoT + Fabric) actually enables agentic AI in manufacturing. With governance for IP + ICS security.

The Engineering Playbook for Multi-Model AI: MCP, AI Gateways, and How to Architect for the Day Your Vendor Pivots

Multi-model AI engineering playbook 2026: MCP protocol, AI gateways (LiteLLM/Portkey/Foundry), vendor-portable architecture patterns, migration from Claude Code to GitHub Copilot CLI.

Errin O'Connor

CEO & Chief AI Architect

•

May 15, 2026

•

18 min read

Multi-Model AIMCPModel Context ProtocolAI GatewayLiteLLMPortkeyMicrosoft FoundryClaude CodeGitHub Copilot CLIEnterprise Architecture

18 min readPublished May 15, 2026

Key Takeaways

Multi-model AI engineering playbook 2026: MCP protocol, AI gateways (LiteLLM/Portkey/Foundry), vendor-portable architecture patterns, migration from Claude Code to GitHub Copilot CLI.

TL;DR

Microsoft's May 14 2026 decision to cancel internal Claude Code licenses by June 30 is the public-facing example of a quiet truth in enterprise AI: the tool layer changes faster than the model layer. The companion CIO-level blog covered the strategic case. This one is the engineering playbook.
The Model Context Protocol (MCP), now at over 10,000 active public servers and 97 million monthly SDK downloads (Anthropic, March 2026), is the closest thing the industry has to a vendor-neutral standard for connecting AI agents to tools, data, and services. MCP is the foundation of vendor-portable AI engineering.
The AI gateway layer — LiteLLM, Portkey, OpenRouter, Microsoft Foundry, or self-hosted equivalents — is where multi-model strategy goes from "good intention" to "operational reality." A gateway abstracts the provider-specific API surface, routes requests to the right model per-use-case, handles failover, and gives you the single audit log compliance demands.
GitHub Copilot CLI vs Claude Code is not a binary choice for most enterprises. Copilot is built for moment-to-moment developer flow (inline completions, 32-128k context, Microsoft toolchain integration). Claude Code is built for delegated autonomous work (up to 1M tokens of context, multi-file refactoring, SWE-bench Verified 87.6% on Opus 4.7 GA). The right enterprise pattern uses both — and architects the application code so the underlying model can change without rewrites.
This guide details the technical architecture, the migration patterns, the code-level abstractions, and the EPC Group implementation framework for engineering teams who need to make multi-model strategy real.

Executive Summary

GitHub Copilot CLI vs Claude Code: The Technical Comparison

Feature comparison matrix

Dimension	GitHub Copilot CLI	Claude Code
Primary workflow	Augment developer flow	Delegate autonomous work
Inline completions	Yes (native IDE integration)	No
Maximum context	32k–128k tokens (model-dependent)	Up to 1M tokens (Sonnet/Opus)
Repository awareness	File-level + recent context	Full repository, multi-file, call-chain reasoning
Multi-file refactoring	Limited (agent mode developing)	Native, primary use case
Model selection	Anthropic Claude (Opus 4.6, Sonnet 4.6, Haiku 4.5), OpenAI GPT family, others via `--model`	Anthropic Claude family
Agentic depth	Inline + specialized agents + background delegation	Deep autonomous; plans multi-step, returns diffs
SWE-bench Verified score	Varies by underlying model	87.6% on Opus 4.7 GA (April 16, 2026) — highest published
Pricing — Free tier	$0 (2,000 completions + 50 premium requests)	Limited Pro at $20/mo
Pricing — Pro tier	$10/mo	$100/mo (Max 5x)
GitHub integration	Native	Via MCP
Microsoft toolchain integration	Native (Azure DevOps, GitHub, Teams)	Via MCP
Terminal-first UX	Available in Copilot CLI	Primary mode

The right pattern for most enterprises

For most professional engineering teams the answer is both, not either.

Copilot Pro at $10/month covers inline completions, quick agent tasks, GitHub PR review integration, and Microsoft toolchain workflows. It is the moment-to-moment AI tool that disappears into the IDE.
Claude Code Max at $100/month (or Anthropic API access through Microsoft Foundry) handles the 20% of tasks that are complex enough to justify an autonomous agent with deep codebase understanding — large refactors, cross-cutting feature branches, architecture exploration, debugging tangled call chains.

The MCP Layer: Why It Matters for Vendor Portability

What MCP actually solves

Architecture: client-server-host

The MCP architecture has three components:

Host — the AI application the end user interacts with (Claude Desktop, Copilot CLI, custom agent application).
Client — a stateful connection within the host, one per MCP server. The client handles the JSON-RPC channel.
Server — the tool, data source, or service that exposes itself via MCP. Examples: a GitHub MCP server, a Postgres MCP server, a SharePoint MCP server, an internal-knowledge-base MCP server.

Why this matters for vendor portability

This is the architectural pattern that makes Microsoft's Claude Code cancellation a non-event from a tools perspective for teams that built on MCP. The tool integrations move with the team.

The 2026 MCP roadmap

The MCP roadmap published in early 2026 introduces several changes that matter for enterprise adoption:

Streamable HTTP transport (November 2025 spec) replaces legacy Server-Sent Events. MCP servers can now run as remote services rather than local processes, enabling stateless deployment behind load balancers.
Stateless server operation in H2 2026 will allow MCP servers to scale across multiple instances with session migration handled transparently.
MCP Server Cards will provide automatic discovery — agents can find and connect to compatible MCP servers without manual configuration.
Agent-to-Agent (A2A) coordination will mature MCP from single-agent tool connections into the infrastructure for multi-agent orchestration.

The AI Gateway Layer: Where Multi-Model Becomes Real

The four production gateway options

Selection decision framework

For most EPC Group clients, the decision is:

Workload	Recommended gateway
Microsoft-centric enterprise, broad model catalog, integrated governance	Microsoft Foundry
Regulated industry with content-policy enforcement at gateway	Portkey
Engineering teams self-hosting all routing logic (data sovereignty)	LiteLLM
Rapid prototyping or non-production exploration	OpenRouter
Multi-cloud / multi-vendor with no preference	LiteLLM (open) or Portkey (managed)

What a gateway gives you in practice

Single API surface for application code. The application calls gateway.invoke(use_case='legal_review', prompt=...) and the gateway decides which model handles the request. The application does not know whether the underlying call went to Claude Sonnet on Foundry, GPT-4 on Azure OpenAI, or a self-hosted Llama instance.
Per-use-case model routing. Different use cases route to different models based on the routing rules. Legal review goes to Claude Sonnet 4.6 for the long context. Customer-service summarization goes to Claude Haiku 4.5 for cost. Code generation goes to Claude Opus 4.7 via Copilot CLI for the SWE-bench performance.
Failover. If the primary model is unavailable (rate-limited, deprecated, contractually unavailable), the gateway routes to the configured fallback. Enterprises with tested failover resolve outages four times faster than those without.
Cost observability. Every request is logged with its cost. A single dashboard answers "how much did we spend on AI last month, by use case, by model, by team."
Compliance posture. PII filtering, content policy enforcement, audit logging, and sensitivity-label propagation happen at the gateway layer. Compliance evidence is produced once for all model usage rather than per-provider.
A/B routing. New models can be tested in production traffic by routing a percentage of requests to them and comparing outcomes. The application code does not change.

The Vendor-Portable Application Pattern

The implementation pattern that makes multi-model AI engineering survive vendor pivots:

Layer 1: Use case-named invocation

Application code invokes the gateway by use case name, not by model name.

# Wrong — vendor-locked
response = anthropic.messages.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": query}],
    max_tokens=4000,
)

# Right — use-case-named, vendor-agnostic
response = ai_gateway.invoke(
    use_case="legal_contract_review",
    messages=[{"role": "user", "content": query}],
    max_tokens=4000,
)

Layer 2: MCP for tool/data integration

Every tool and data source the agent needs is exposed through an MCP server. The agent's tool surface is defined in MCP, not in vendor-specific function-call schemas.

Layer 3: Sensitivity-label propagation

This is the layer that makes regulated-industry deployments survive a vendor change. The compliance posture is in the gateway, not in the vendor-specific call.

Layer 4: Observability and audit

Every gateway request produces:

Identity (which user, which application, which use case).
Timestamp.
Model invoked + version.
Sensitivity label context.
Input token count and content classification.
Output token count and content classification.
Cost.
Outcome (success / fallback used / failure).

These events flow to Microsoft Sentinel (or the customer's SIEM) for security operations and to the cost-management surface for FinOps. Single source of truth across all models.

The Migration Playbook: Claude Code to GitHub Copilot CLI

For engineering teams in Microsoft's affected groups — and for any team facing an equivalent forced tooling migration — the playbook EPC Group recommends:

Week 1–2: Inventory and assessment

Document every workflow currently using Claude Code: feature development, refactoring, debugging, code review, documentation generation.
Identify Claude Code-specific capabilities the team has built dependencies on: deep repository awareness, long-context analysis, multi-file refactoring, autonomous agent workflows.
Map each workflow to its Copilot CLI equivalent and identify gaps.

Week 3–4: Copilot CLI setup with Claude models retained

Install Copilot CLI across the affected developer population.
Configure --model to default to claude-sonnet-4-6 (or claude-opus-4-7 for the autonomous agent workflows). Anthropic models remain available via Copilot CLI — the interface changes, not the model.
Identify use cases that genuinely require Claude Code's deeper agent surface and budget for Claude Code Max licenses outside the corporate cancellation for those engineers.

Week 5–8: Workflow migration

Migrate each workflow from Claude Code to Copilot CLI with the team that owns it. Document the new pattern, the new keyboard shortcuts, the new prompt patterns.
Training sessions for engineers on Copilot CLI agent patterns, GitHub integration, and multi-model flag usage.
Update internal documentation and runbooks.

Week 9–12: Stabilization and evaluation

Track productivity metrics through the transition (acceptance rate of completions, time-to-resolution for complex tasks, developer NPS).
Identify any workflows that genuinely broke and need Claude Code Max remediation.
Evaluate whether the team should also stand up an MCP-based agent infrastructure for the autonomous workflows that no inline tool replaces.

Special note for Microsoft-affected teams

Regulated-Industry Overlay

For healthcare, financial services, federal, and defense-contractor environments, the multi-model engineering pattern adds compliance overlays:

Healthcare (HIPAA)

All gateway traffic involving PHI routes only through models covered by the Microsoft Business Associate Agreement (Microsoft Foundry models qualify; some third-party endpoints do not).
Sensitivity labels propagate to gateway and gateway-level policy blocks PHI from going to non-BAA endpoints.
Audit logging routes to Microsoft Sentinel with HIPAA-aligned analytic rules.

Financial Services (SR 11-7 + SOC 2 + SOX)

Model risk inventory includes every model invoked through the gateway, not just the "official" production model.
Change control on gateway routing rules (a routing change is a model change in SR 11-7 terms).
Audit logging retention aligned to SOX (7 years) and SR 11-7 (model lifetime + post-retirement).

Federal (FedRAMP)

Gateway and underlying model endpoints must be in the appropriate Azure Government tier (GCC or GCC High).
Cross-vendor routing requires verification that every endpoint is in the same FedRAMP authorization boundary.
NIST 800-53 control mapping extends to gateway-layer controls.

Defense (CMMC)

Similar to FedRAMP but with the additional Controlled Unclassified Information (CUI) handling requirements per NIST 800-171.

EPC Group's Multi-Model AI Engineering Framework

For enterprises building or refactoring AI applications for multi-model resilience, the EPC Group standard pattern:

Weeks 1–4: Architecture assessment

Current-state inventory of AI application surface area.
Current-state vendor concentration analysis.
Use case enumeration with sensitivity classification.
Compliance framework scoping.

Weeks 5–8: Gateway selection and standup

Gateway selection decision (Microsoft Foundry / Portkey / LiteLLM / hybrid).
Gateway infrastructure provisioning.
Sensitivity-label propagation configuration.
Audit log routing to Microsoft Sentinel.

Weeks 9–14: Application abstraction layer

Application code refactored to use-case-named gateway invocation.
MCP server inventory and standardization for tool/data integration.
Failover routing rules configured per use case.
A/B routing infrastructure ready for production model evaluation.

Weeks 15–18: Testing and stabilization

Failover validation against representative tasks for every use case.
Compliance evidence packaging.
Performance and cost baseline establishment.
Operational runbook documentation.

Weeks 19–22: Production cutover

Phased migration of production traffic through the new gateway.
Observability dashboards for cost, performance, compliance.
Vendor agreement reviews informed by the now-credible alternative.

Weeks 23–26: Handover and steady-state

Operations team training on gateway management.
Quarterly review cadence established.
Center-of-Excellence stand-up if not already present.

Common Engineering Pitfalls

Across the multi-model engineering engagements EPC Group has led:

Building against a vendor's function-calling format instead of MCP. Locks tool integrations to a specific agent runtime. Refactor unavoidable when the runtime changes.
Treating the gateway as optional. Direct-to-provider API calls scattered through the codebase mean every vendor change requires application-level refactoring.
Configuring failover but never testing it. Failover that has never run in production fails when it is needed. Schedule quarterly failover drills.
Mixing gateway and direct calls. Some teams adopt a gateway for new code but leave legacy direct calls. The audit log fragments. The compliance evidence breaks. Migrate everything or commit to a dated cutover.
Skipping the sensitivity-label propagation work. Without it, the gateway cannot enforce label-based policy. Compliance posture has a hole the team will discover during the next audit.
Choosing the wrong gateway for the use case. OpenRouter is wonderful for prototyping; it is the wrong choice for healthcare PHI workloads. Portkey is excellent for content-policy enforcement; it is unnecessary overhead for engineering-only workloads.
Forgetting cost observability until the bill arrives. Gateway-level cost dashboards should exist from day one. The first month of production usage is where the cost surprises happen.

Frequently Asked Questions

What is the Model Context Protocol (MCP)?

Why does MCP matter for enterprise architecture?

What is an AI gateway?

Which gateway should my enterprise use?

How is GitHub Copilot CLI different from Claude Code technically?

Did Microsoft cancelling Claude Code make Claude unavailable?

What is the cost implication of multi-model architecture?

How does failover work in practice?

How does this interact with Microsoft Foundry?

What does the EPC Group migration playbook produce?

How does this apply to regulated industries?

What is the typical engineering team size for this work?

Does the pattern work with Bring-Your-Own-Model deployments?

How does MCP differ from OpenAI's function calling or Anthropic's tool use?

What is the realistic timeline to multi-model production?

About EPC Group

Next Steps

Inventory your AI vendor concentration and tooling dependencies.
Assess your application architecture against the vendor-portable pattern (use-case-named gateway invocation, MCP for tool integration, sensitivity-label propagation, single audit log).
Select the appropriate AI gateway for your environment.
Run failover validation on your top three AI use cases to determine whether your secondary-model strategy is real or theoretical.
Engage a partner with deep multi-model AI engineering experience to compress the planning timeline.

To discuss multi-model AI engineering with EPC Group's senior architects, contact us or call (888) 381-9725.

Share this article:

Errin O'Connor

CEO & Chief AI Architect

Microsoft Press bestselling author with 29 years of enterprise consulting experience.

View Full Profile

The Engineering Playbook for Multi-Model AI: MCP, AI Gateways, and How to Architect for the Day Your Vendor Pivots

Key Takeaways

TL;DR

Executive Summary

GitHub Copilot CLI vs Claude Code: The Technical Comparison

Feature comparison matrix

The right pattern for most enterprises

The MCP Layer: Why It Matters for Vendor Portability

What MCP actually solves

Architecture: client-server-host

Why this matters for vendor portability

The 2026 MCP roadmap

The AI Gateway Layer: Where Multi-Model Becomes Real

The four production gateway options

Selection decision framework

What a gateway gives you in practice

The Vendor-Portable Application Pattern

Layer 1: Use case-named invocation

Layer 2: MCP for tool/data integration

Layer 3: Sensitivity-label propagation

Layer 4: Observability and audit

The Migration Playbook: Claude Code to GitHub Copilot CLI

Week 1–2: Inventory and assessment

Week 3–4: Copilot CLI setup with Claude models retained

Week 5–8: Workflow migration

Week 9–12: Stabilization and evaluation

Special note for Microsoft-affected teams

Regulated-Industry Overlay

Healthcare (HIPAA)

Financial Services (SR 11-7 + SOC 2 + SOX)

Federal (FedRAMP)

Defense (CMMC)

EPC Group's Multi-Model AI Engineering Framework

Weeks 1–4: Architecture assessment

Weeks 5–8: Gateway selection and standup

Weeks 9–14: Application abstraction layer

Weeks 15–18: Testing and stabilization

Weeks 19–22: Production cutover

Weeks 23–26: Handover and steady-state

Common Engineering Pitfalls

Frequently Asked Questions

What is the Model Context Protocol (MCP)?

Why does MCP matter for enterprise architecture?

What is an AI gateway?

Which gateway should my enterprise use?

How is GitHub Copilot CLI different from Claude Code technically?

Did Microsoft cancelling Claude Code make Claude unavailable?

What is the cost implication of multi-model architecture?

How does failover work in practice?

How does this interact with Microsoft Foundry?

What does the EPC Group migration playbook produce?

How does this apply to regulated industries?

What is the typical engineering team size for this work?

Does the pattern work with Bring-Your-Own-Model deployments?

How does MCP differ from OpenAI's function calling or Anthropic's tool use?

What is the realistic timeline to multi-model production?

About EPC Group

Next Steps

Errin O'Connor

Related Articles

Microsoft Copilot Change Management Practice: Prosci ADKAR + Quarterly Scorecard (2026)

Microsoft 365 Copilot Readiness Checklist: The CIO Edition (2026)

Agentic Factory: Microsoft Foundry + Agent 365 + Defender for IoT for Manufacturing (2026)

Need Help with AI Strategy?

The Engineering Playbook for Multi-Model AI: MCP, AI Gateways, and How to Architect for the Day Your Vendor Pivots

Key Takeaways

TL;DR

Executive Summary

GitHub Copilot CLI vs Claude Code: The Technical Comparison

Feature comparison matrix

The right pattern for most enterprises

The MCP Layer: Why It Matters for Vendor Portability

What MCP actually solves

Architecture: client-server-host

Why this matters for vendor portability

The 2026 MCP roadmap

The AI Gateway Layer: Where Multi-Model Becomes Real

The four production gateway options

Selection decision framework

What a gateway gives you in practice