
The Engineering Playbook for Multi-Model AI: MCP, AI Gateways, and How to Architect for the Day Your Vendor Pivots
Multi-model AI engineering playbook 2026: MCP protocol, AI gateways (LiteLLM/Portkey/Foundry), vendor-portable architecture patterns, migration from Claude Code to GitHub Copilot CLI.
Multi-model AI engineering playbook 2026: MCP protocol, AI gateways (LiteLLM/Portkey/Foundry), vendor-portable architecture patterns, migration from Claude Code to GitHub Copilot CLI.

If your enterprise's AI strategy is built around a specific tool — Claude Code, Cursor, Codeium, ChatGPT for Enterprise, anything — you are one vendor decision away from a six-to-twelve-month tooling disruption. Microsoft just demonstrated that publicly with the Claude Code cancellation.
If your enterprise's AI strategy is built around a specific model — Anthropic Claude Sonnet 4.6 as the only LLM behind every agent — you are one model deprecation, one price increase, or one safety-policy update away from a different kind of disruption.
The engineering answer is vendor-portable architecture: write the application code against a stable abstraction layer, route requests through a gateway that can swap models, and use the Model Context Protocol for tool/data integration so the agent surface stays consistent across vendors.
This guide details how to actually build that. We cover the gateway pattern, the MCP protocol, the specific GitHub Copilot CLI vs Claude Code engineering comparison, the migration playbook, code-level abstraction patterns, regulated-industry overlays, and the EPC Group implementation framework refined across enterprise AI engineering engagements.
The first practical question for any engineering team affected by Microsoft's Claude Code cancellation (or for any team weighing the two): are these tools actually equivalent? They are not. They overlap, but they are optimized for different workflows.
| Dimension | GitHub Copilot CLI | Claude Code |
|---|---|---|
| Primary workflow | Augment developer flow | Delegate autonomous work |
| Inline completions | Yes (native IDE integration) | No |
| Maximum context | 32k–128k tokens (model-dependent) | Up to 1M tokens (Sonnet/Opus) |
| Repository awareness | File-level + recent context | Full repository, multi-file, call-chain reasoning |
| Multi-file refactoring | Limited (agent mode developing) | Native, primary use case |
| Model selection | Anthropic Claude (Opus 4.6, Sonnet 4.6, Haiku 4.5), OpenAI GPT family, others via --model |
Anthropic Claude family |
| Agentic depth | Inline + specialized agents + background delegation | Deep autonomous; plans multi-step, returns diffs |
| SWE-bench Verified score | Varies by underlying model | 87.6% on Opus 4.7 GA (April 16, 2026) — highest published |
| Pricing — Free tier | $0 (2,000 completions + 50 premium requests) | Limited Pro at $20/mo |
| Pricing — Pro tier | $10/mo | $100/mo (Max 5x) |
| GitHub integration | Native | Via MCP |
| Microsoft toolchain integration | Native (Azure DevOps, GitHub, Teams) | Via MCP |
| Terminal-first UX | Available in Copilot CLI | Primary mode |
For most professional engineering teams the answer is both, not either.
For Microsoft-cancelled-Claude-Code teams specifically, the transition is not "lose Claude capability." Claude Opus 4.6 and 4.7 remain in Copilot CLI via the --model flag. The transition is "lose the Claude Code interface and adopt Copilot CLI's interface." The model intelligence stays; the cockpit changes.
The Model Context Protocol, originally introduced by Anthropic in late 2024, has become the closest thing to a vendor-neutral standard for connecting AI agents to tools, data, and services. By March 2026, all major providers had adopted it; Anthropic reported over 10,000 active public MCP servers and 97 million monthly SDK downloads across Python and TypeScript.
Before MCP, every AI agent had to integrate with every tool through that tool's specific API, that vendor's specific function-calling format, and that agent runtime's specific orchestration pattern. A GitHub integration written for Claude Code did not transfer to Cursor; a Slack integration written for ChatGPT did not transfer to Copilot. The result was N×M integration surface area for N agents × M tools.
MCP collapses this to N+M. Each tool exposes itself as an MCP server using a standard JSON-RPC 2.0 protocol. Each agent acts as an MCP client. Any compliant client can connect to any compliant server.
The MCP architecture has three components:
A host creates multiple isolated client sessions, each maintaining its own JSON-RPC channel with its own MCP server. Tool calls, resource access, and prompts flow over those channels using a standard message format.
If your agent application is built against MCP rather than against a specific vendor's tool-calling format, the agent can be swapped from Claude Code to Copilot CLI to a custom OpenAI-backed agent with zero rework on the tool integration layer. The agent's intelligence model changes; the tools it can call do not.
This is the architectural pattern that makes Microsoft's Claude Code cancellation a non-event from a tools perspective for teams that built on MCP. The tool integrations move with the team.
The MCP roadmap published in early 2026 introduces several changes that matter for enterprise adoption:
For enterprise architects, this means MCP is moving from "a nice protocol" to "the substrate for AI tool integration." Building against it now produces architecture that survives the next several vendor pivots.
If MCP is the integration substrate, the AI gateway is the routing substrate. A gateway sits between your application code and the underlying model providers, providing a unified API surface, request routing, failover, observability, and (in the enterprise tier) governance and compliance controls.
LiteLLM — open source and self-hosted. Best for engineering teams with resources to operate their own infrastructure and who need auditable routing logic with no external dependencies. Supports 100+ models across providers. Used heavily by teams that need to keep all routing logic on their own infrastructure for compliance reasons.
Portkey — enterprise-grade managed gateway with strong governance posture. PII filtering, content policies, guardrails, and observability at the gateway layer. Best for regulated industries where compliance controls must be enforced before requests reach providers. Used heavily in healthcare, financial services, and federal contexts.
OpenRouter — marketplace-style proxy with unified API access to 300+ models from 60+ providers. Best for prototyping, rapid experimentation, and teams that want the broadest model catalog without operating their own infrastructure. Less suitable for enterprise compliance scenarios because the underlying routing happens through a third-party.
Microsoft Foundry — Microsoft's own AI gateway, with the broadest enterprise model catalog (OpenAI, Anthropic, Cohere, DeepSeek, Mistral, Meta, Microsoft's own models). Best for Microsoft-centric enterprises because it integrates with the rest of Microsoft Foundry's enterprise controls (Azure AD identity, Microsoft Purview labels, Microsoft Sentinel routing, Azure-native networking).
For most EPC Group clients, the decision is:
| Workload | Recommended gateway |
|---|---|
| Microsoft-centric enterprise, broad model catalog, integrated governance | Microsoft Foundry |
| Regulated industry with content-policy enforcement at gateway | Portkey |
| Engineering teams self-hosting all routing logic (data sovereignty) | LiteLLM |
| Rapid prototyping or non-production exploration | OpenRouter |
| Multi-cloud / multi-vendor with no preference | LiteLLM (open) or Portkey (managed) |
For most enterprises, two-gateway architectures are common: Microsoft Foundry as the primary production gateway (because the model catalog is broadest and Microsoft integrations are deepest), with LiteLLM or Portkey as a secondary surface for use cases that require capabilities the primary gateway does not provide.
Single API surface for application code. The application calls gateway.invoke(use_case='legal_review', prompt=...) and the gateway decides which model handles the request. The application does not know whether the underlying call went to Claude Sonnet on Foundry, GPT-4 on Azure OpenAI, or a self-hosted Llama instance.
Per-use-case model routing. Different use cases route to different models based on the routing rules. Legal review goes to Claude Sonnet 4.6 for the long context. Customer-service summarization goes to Claude Haiku 4.5 for cost. Code generation goes to Claude Opus 4.7 via Copilot CLI for the SWE-bench performance.
Failover. If the primary model is unavailable (rate-limited, deprecated, contractually unavailable), the gateway routes to the configured fallback. Enterprises with tested failover resolve outages four times faster than those without.
Cost observability. Every request is logged with its cost. A single dashboard answers "how much did we spend on AI last month, by use case, by model, by team."
Compliance posture. PII filtering, content policy enforcement, audit logging, and sensitivity-label propagation happen at the gateway layer. Compliance evidence is produced once for all model usage rather than per-provider.
A/B routing. New models can be tested in production traffic by routing a percentage of requests to them and comparing outcomes. The application code does not change.
The implementation pattern that makes multi-model AI engineering survive vendor pivots:
Application code invokes the gateway by use case name, not by model name.
# Wrong — vendor-locked
response = anthropic.messages.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": query}],
max_tokens=4000,
)
# Right — use-case-named, vendor-agnostic
response = ai_gateway.invoke(
use_case="legal_contract_review",
messages=[{"role": "user", "content": query}],
max_tokens=4000,
)
The gateway holds the configuration that says "legal_contract_review routes to Claude Sonnet 4.6 with these parameters." When the model decision changes (a new release, a vendor pivot, a contract renegotiation), only the gateway configuration changes. Application code does not.
Every tool and data source the agent needs is exposed through an MCP server. The agent's tool surface is defined in MCP, not in vendor-specific function-call schemas.
This means: when the underlying model changes (e.g., Claude Sonnet 4.6 → GPT-5), the tool definitions, the tool authentication, the tool authorization, and the tool audit trail all stay the same. The model swaps; the agent stays connected to the same SharePoint, Pipedrive, internal knowledge base, and operational systems.
Every request to the gateway carries the sensitivity classification of its input data. Microsoft Purview sensitivity labels propagate from the data layer through to the model invocation. The gateway enforces label-based policy: Highly Confidential content cannot be processed by external models; Confidential content is logged with full audit detail; Public content can use the cost-optimized model.
This is the layer that makes regulated-industry deployments survive a vendor change. The compliance posture is in the gateway, not in the vendor-specific call.
Every gateway request produces:
These events flow to Microsoft Sentinel (or the customer's SIEM) for security operations and to the cost-management surface for FinOps. Single source of truth across all models.
For engineering teams in Microsoft's affected groups — and for any team facing an equivalent forced tooling migration — the playbook EPC Group recommends:
--model to default to claude-sonnet-4-6 (or claude-opus-4-7 for the autonomous agent workflows). Anthropic models remain available via Copilot CLI — the interface changes, not the model.The cancellation is a tool decision, not a model decision. Engineers losing Claude Code today still have access to Claude Opus 4.6, Sonnet 4.6, Haiku 4.5 (and presumably Opus 4.7 + Sonnet 4.7 as they release) through Copilot CLI's --model flag. The intelligence stays. The workflow pattern is what shifts.
For healthcare, financial services, federal, and defense-contractor environments, the multi-model engineering pattern adds compliance overlays:
For enterprises building or refactoring AI applications for multi-model resilience, the EPC Group standard pattern:
The 26-week timeline is for substantial Fortune 500 implementations. Smaller scope engagements run shorter. Greenfield AI applications can adopt the pattern from day one with materially less retrofit work.
Across the multi-model engineering engagements EPC Group has led:
Building against a vendor's function-calling format instead of MCP. Locks tool integrations to a specific agent runtime. Refactor unavoidable when the runtime changes.
Treating the gateway as optional. Direct-to-provider API calls scattered through the codebase mean every vendor change requires application-level refactoring.
Configuring failover but never testing it. Failover that has never run in production fails when it is needed. Schedule quarterly failover drills.
Mixing gateway and direct calls. Some teams adopt a gateway for new code but leave legacy direct calls. The audit log fragments. The compliance evidence breaks. Migrate everything or commit to a dated cutover.
Skipping the sensitivity-label propagation work. Without it, the gateway cannot enforce label-based policy. Compliance posture has a hole the team will discover during the next audit.
Choosing the wrong gateway for the use case. OpenRouter is wonderful for prototyping; it is the wrong choice for healthcare PHI workloads. Portkey is excellent for content-policy enforcement; it is unnecessary overhead for engineering-only workloads.
Forgetting cost observability until the bill arrives. Gateway-level cost dashboards should exist from day one. The first month of production usage is where the cost surprises happen.
MCP is an open protocol introduced by Anthropic in late 2024 that defines how AI agents connect to tools, data sources, and services. It uses JSON-RPC 2.0 over Streamable HTTP transport (as of November 2025 spec). By March 2026, all major AI providers had adopted MCP; over 10,000 active public MCP servers and 97 million monthly SDK downloads were reported. MCP is the closest the industry has to a vendor-neutral standard for AI tool integration.
MCP collapses the N×M integration problem (N agents × M tools) into N+M. Every tool exposes itself once as an MCP server; any MCP-compatible agent (Claude Desktop, Copilot CLI, custom applications) can consume the tool. Tool integrations become vendor-portable. The agent's intelligence model can change without the tool integration layer changing.
An AI gateway is the infrastructure layer that sits between application code and AI model providers. It provides a unified API surface, request routing, failover, observability, cost management, and (in enterprise gateways) governance and compliance controls. Production options include LiteLLM (open source self-hosted), Portkey (managed enterprise), OpenRouter (marketplace-style proxy), and Microsoft Foundry (Microsoft's own managed gateway).
For Microsoft-centric enterprises with regulated-industry workloads, Microsoft Foundry as the primary plus optionally Portkey or LiteLLM for use cases needing additional governance. For non-Microsoft enterprises, the decision depends on compliance posture, self-hosting preferences, and model catalog breadth. Two-gateway architectures (primary + secondary) are common at enterprise scale.
Copilot CLI is optimized for moment-to-moment developer flow with inline completions, 32k-128k context, and native integration into Microsoft toolchain (GitHub, Azure DevOps, Teams). Claude Code is optimized for delegated autonomous work with up to 1M tokens of context, multi-file refactoring, and deep repository reasoning. They overlap but are not interchangeable. Most enterprises benefit from both.
No. The Claude Code tool was cancelled. Anthropic's Claude models (Opus 4.6, 4.7, Sonnet 4.6, Haiku 4.5) remain available through GitHub Copilot CLI via the --model flag and through Microsoft Foundry. The intelligence stays accessible; only the interface and the agent workflow pattern change.
Gateway overhead is typically modest (sub-cent per request at most managed gateways). The real cost lever is per-use-case model selection — routing simpler use cases to cheaper models (Haiku, GPT-4 family) and reserving the high-capability models (Opus, GPT-5 family) for cases that need them. EPC Group clients typically see 30-60% cost reductions from intentional model routing versus single-provider strategies.
Failover requires three components: (1) the secondary model configured per use case, (2) the failover trigger logic (rate limit hit, timeout exceeded, error code returned, contractual unavailability), and (3) the validation pipeline that confirms the secondary model produces acceptable outputs for the use case. Component 3 is the one that gets skipped most often and the one that determines whether failover works when needed.
Microsoft Foundry is itself an AI gateway with the broadest enterprise model catalog. For most Microsoft-centric enterprises, Foundry is the right primary gateway. Multi-model architecture on Foundry means using Foundry's catalog deliberately (different model families for different use cases) plus maintaining a documented exit path in case Foundry's commercial terms change materially. The exit path keeps negotiating leverage where it belongs.
For a Fortune 500 engineering organization moving from a single-vendor AI deployment to multi-model architecture, the typical deliverables: gateway infrastructure (production-grade), application code refactored to use-case-named invocation, MCP server inventory for tool integrations, failover routing rules per use case validated against representative tasks, sensitivity-label propagation, audit log routing to Microsoft Sentinel, compliance evidence packaging, and a 12-month operating runbook. Standard engagement is 26 weeks.
For healthcare (HIPAA), financial services (SR 11-7 + SOC 2 + SOX), federal (FedRAMP), and defense (CMMC), the multi-model engineering pattern adds explicit compliance overlays: BAA verification per endpoint, model risk inventory including gateway routing rules, audit log retention aligned to framework requirements, NIST 800-53 control mapping. EPC Group's framework integrates these overlays into the gateway-layer policy rather than retrofitting them at the application layer.
For a Fortune 500 enterprise implementation, the engineering team includes: 1 senior architect (named in SOW), 2-4 platform engineers for gateway infrastructure, 1-2 application engineers per affected application for the refactor, 1 compliance liaison for regulated-industry overlays, and 1 program manager. EPC Group's typical engagement provides the senior architect and platform engineers; the customer typically provides the application engineers (who know the application best) and the compliance liaison.
Yes. Open-weights models (Meta Llama, Mistral, DeepSeek, Cohere Command R) can be self-hosted or accessed through gateways. The gateway abstraction handles them the same way as managed-provider models. Use cases requiring sovereign data handling (on-premises model invocation, air-gapped deployments) benefit particularly from the gateway pattern because the model location is opaque to application code.
OpenAI function calling and Anthropic tool use are vendor-specific formats for telling a model what tools exist and how to invoke them. MCP is a vendor-neutral protocol for the same concept. Application code written against MCP works with any MCP-compatible client/host (Claude Desktop, Copilot CLI, custom agents). Application code written against function calling or tool use is locked to that specific provider.
For a single application: 12-16 weeks from architecture decision to multi-model production traffic. For a Fortune 500 enterprise's broader AI portfolio: 26 weeks for the first wave, with subsequent applications following the established pattern in 6-8 weeks each. The 26-week first-wave timeline includes the gateway infrastructure, the MCP server inventory, the compliance overlay, and the operational handover that subsequent applications inherit.
EPC Group is a 29-year Microsoft consulting firm serving Fortune 500 companies, federal agencies, healthcare systems, financial institutions, government, manufacturing, energy, education, retail, technology, and global enterprises. The firm has delivered more than 11,000 Microsoft implementations including 6,500-plus SharePoint deployments, 1,500-plus Power BI implementations, and 500-plus Microsoft Fabric engagements.
EPC Group is Microsoft Solutions Partner with the core designations across the Microsoft AI Cloud Partner Program. The firm was historically the oldest continuous Microsoft Gold Partner in North America from 2016 until the program's retirement, and is a five-time G2 Leader in Business Intelligence Consulting with a perfect 100 Net Promoter Score (Spring 2026).
Founder Errin O'Connor is a four-time Microsoft Press best-selling author, former NASA Lead Architect, and a member of the Microsoft SharePoint Project Tahoe and Microsoft Power BI Project Crescent beta teams.
If your engineering organization is rebuilding around multi-model AI architecture — whether prompted by the Microsoft Claude Code cancellation, by a vendor pricing change, or by a strategic decision to avoid future vendor disruption:
To discuss multi-model AI engineering with EPC Group's senior architects, contact us or call (888) 381-9725.
CEO & Chief AI Architect
Microsoft Press bestselling author with 29 years of enterprise consulting experience.
View Full ProfileOur team of experts can help you implement enterprise-grade ai strategy solutions tailored to your organization's needs.