
AI-assisted coding gains are real. So is the bill. EPC Group's multi-AI governance playbook for 2026.
AI-assisted coding is real and the productivity gains are not fake. But the gains are concentrated in a single bright moment — the typing — while the costs are smeared across everything downstream: review load, security exposure, production incidents, cloud spend, leaked customer data, and years of maintenance. The same mistake is now being made at three altitudes: shipping unreviewed AI code, rebuilding entire software stacks because "building got cheap," and wiring models into operations with no runtime oversight. All three confuse the cost of creation with the cost of ownership — and confidence with correctness. The fix at every altitude is the same unglamorous word: governance.
Jump to
There is a particular kind of meeting I keep finding myself in this year. A CIO or a VP of Engineering pulls me aside after the formal agenda is done, lowers their voice, and says some version of the same sentence: “We're shipping faster than we ever have. So why does everything feel like it's held together with tape?”
That feeling finally has a name. A pair of well-known engineers who helped popularize the agentic coding wave went public this spring with a blunt warning: the same AI tools being sold as a replacement for expensive software developers are quietly flooding companies with code that is buggy, hard to maintain, and in some cases genuinely unsafe to run. They call it “vibe slop” — a collision of “vibe coding,” where you describe software in plain English and let a model write it, and “AI slop,” the endless low-value machine-generated content choking the internet.
The people sounding the alarm are not outsiders throwing rocks. They are Mario Zechner and Armin Ronacher, the engineers behind core pieces of OpenClaw — one of the most popular autonomous agents on the market. When the people who built the engine tell you the output is a mess under the hood, that is not a hot take. It is a field report from inside the machine.
I have spent twenty-nine years inside enterprise Microsoft environments — migrations, governance, security hardening, the unglamorous plumbing that keeps large organizations running. So when two people this close to the agentic wave raise their hands and say the floor is rotting, I do not treat it as noise. It lines up almost exactly with what I am seeing inside real client tenants right now.
This post takes that warning seriously without either of the two reflexes everyone defaults to. The gains are real, the bill is also real, and most teams are living in the gap between the two — with no line item on the budget for the gap. Three things are happening at once and almost nobody is connecting them: developers are shipping fragile AI-generated code, companies are tearing up their software budgets to “just build it with AI,” and the consumer-facing tools that touch your tax return and your retirement account are increasingly being assembled the same fast, unverified way. Same root cause, three blast radii. By the end you will see why they are one story — and why the answer is the least glamorous word in technology: governance.
Strip away the catchy label and the claim is simple. When you generate code from a casual prompt and ship it without serious scrutiny, you get software that looks finished but is structurally weak. It runs in the demo. It passes the happy path. Then it meets a real-world edge case, a regulatory requirement, or a malicious input, and it folds.
The engineers behind the warning point to two human failure modes that are the real story. The first is automation bias — the very old, very human tendency to trust an answer more because a machine produced it. The second is review fatigue — what happens when developers are buried under a daily avalanche of AI-generated pull requests and simply stop reading them line by line. Neither is a technology problem. They are organizational problems wearing a technology costume.
The warning also flags a downstream consequence that never makes it into the AI-productivity sales deck: cost. Sloppy code does not just break more often. It runs less efficiently. It burns more compute, more memory, more bandwidth. In a cloud-billed world, inefficiency is a line item that grows every month. And there is a sneakier effect: if a meaningful share of public code is machine-generated slop, the signals organizations use to screen engineering hires, vet a vendor's codebase, or assess an acquisition target now run on corrupted inputs. The polish stops correlating with the substance.
Most coverage of vibe slop treats security as a footnote. I will not, because security is the part that keeps me up at night and the part that should keep every CISO up too.
A 2025 assessment that ran more than a hundred large language models across eighty real-world coding tasks found that roughly 45% of AI-generated code contained vulnerabilities from the OWASP Top 10. Java fared worst, failing security checks ~72% of the time. Cross-site scripting flaws showed up at ~86% failure rate across models. And here is the part people skip: newer, more capable models did not fix this. They produced syntactically cleaner code that still carried the same security holes. The models got better at looking correct without getting safer. That is precisely the trap.
Independent scanning work tells the same story from another angle. Researchers who examined ~5,600 vibe-coded applications turned up 2,000+ vulnerabilities and 400+ exposed secrets — hard-coded keys and credentials sitting in code that someone shipped. The failure patterns are remarkably consistent: models skip input validation unless explicitly demanded; they scaffold applications with no authentication at all; they hard-code secrets in a meaningful share of cases; they pull in sprawling dependency trees for trivial requests; they reach for outdated cryptographic algorithms.
This is the same category of problem I have spent the last two years cleaning up on the Microsoft side. Copilot tenant after tenant gets switched on, surfaces sensitive files nobody knew were overshared, and exposes risk that had been sitting dormant for years. AI does not invent your security debt. It finds it, accelerates it, and broadcasts it. The remediation pattern is the same one EPC Group ships in our Copilot Readiness Assessment and the broader Defender XDR + Purview deployment.
If you think this is a developer's inside-baseball problem, I need to move it about a foot closer to home. Financial advisors are increasingly leaning on AI to build the very tools that manage client money — IRA withdrawal optimizers, Social Security benefit calculators, tax-loss harvesting models. Some of those tools are now being vibe-coded. Client tax data and account balances rest on a foundation of code nobody competent ever reviewed.
One late-2025 report found AI-written code produces roughly 1.7× more issues — logic errors and security vulnerabilities — than human-written code, in part because the model uses up to 10× more lines to do the same job. More code is not more value. Often it is just more places to hide a defect.
A founder reported losing thousands in payment-processing fees after trusting AI-generated code for his startup's payment system — the model placed the security key in the front-end, where a hacker found it and ran tens of thousands of dollars in fraudulent charges. The “exposed secrets” failure mode from the security section, except now with a dollar figure and a victim list. And a security firm focused on generative-AI data leaks reported that ~40% of vibe-coded web applications were actively leaking sensitive information — including financial data — straight to the open internet because the popular build-by-prompt platforms ship applications that are public by default.
It gets worse on accountability. The IRS does not accept “the AI made a mistake” as a defense. The SEC has started pursuing advisors for “AI-washing.” A large share of the organizations EPC Group serves live in exactly this blast radius — financial services, healthcare, government, and other regulated industries where a leaked record or a hallucinated number is a reportable event.
EPC Group practice
At EPC Group we did not wait for the industry to discover that one AI model is a liability and a committee of them is an asset. We built our entire multi-AI business intelligence practice on that premise. We do not bet a client's decisions on a single model's confident answer. We run problems across multiple engines — different vendors, different architectures, different blind spots — and treat their disagreement as a feature, not a bug.
When several models converge, you have signal. When they diverge, you have just found the exact place a single-model shop would have shipped a confident, plausible, wrong answer straight into a board deck. Multiple models. One truth.
And here is what most consultants are too polite to admit: AI has lied to me. Directly. Repeatedly. In my own workflows. These systems will fabricate a citation, invent a function that does not exist, and swear a migration step is safe when it absolutely is not, and they will do all of it with the serene confidence of a tenured professor. That is what makes vibe slop genuinely dangerous and not merely annoying — the confidence is decoupled from the accuracy, and humans are wired to read confidence as competence.
Our engagements include explicit AI governance layers: verification gates that force a model to show its work, cross-model adjudication that pits engines against each other, source-traceability requirements so no claim ships without a verifiable origin, and human sign-off thresholds that scale with the blast radius of the decision. The posture: assume the machine is confidently wrong until it proves otherwise. Frameworks like the NIST AI Risk Management Framework and the OWASP Top 10 for LLM Applications exist precisely because the industry is realizing governance has to move down to runtime behavior, not live in a binder.
If I stopped here, I would be writing exactly the kind of one-sided scare piece I dislike. Let me argue the other side as fairly as I can.
Some of the most credible measurement comes straight from provider APIs. Research pulling cohort data from Cursor, GitHub Copilot, and Claude Code found a strong correlation between heavier AI use and greater developer output — during weeks when AI use peaked, heaviest-engagement developers reportedly authored several times more durable work than developers who used no AI at all. Large organizations report meaningful, measurable wins — one telecom reportedly shipped engineering code about 30% faster while saving hundreds of thousands of hours. The same 30% figure keeps surfacing from unrelated directions; practitioners describe Copilot-style tools cutting time-to-deploy and accelerating release cycles by roughly a third.
For validating an idea, vibe coding can replace a $50-100K engineer-built prototype with a few hundred dollars in AI credits. The output is not production software and was never meant to be — it is a validated specification. The danger is never the prototype. The danger is the prototype that quietly graduates to production because it “worked” and nobody made the call to throw it away.
The honest reading is a productivity paradox: AI-assisted coding raises throughput and raises the cost of fixing what it ships, at the same time, in the same organizations. Output goes up. The reliability tax goes up with it. Telemetry across tens of thousands of developers found AI-heavy teams ship more — and PRs grew ~51% larger, bugs per PR rose ~28%, median review time stretched to several times its baseline, incidents per PR roughly tripled. One MIT researcher described AI coding tools as a brand-new credit card that lets us rack up technical debt in ways we never could before. The gains are concentrated at the moment of typing, and the costs are smeared across everything afterward.
In early 2026, roughly a trillion dollars in software market value evaporated in a matter of weeks. The market named it the “SaaSpocalypse.” The thesis: if AI agents can do what enterprise software does, enterprise software is finished — two engineers can replicate a platform in a sprint, per-seat pricing collapses, build everything and buy nothing. Every CFO is now having the same conversation: tap the budget owner, ask whether AI could just build this in-house.
The panic is a real signal. The conclusion is wrong. AI dropped the cost of building to the floor. It did not drop the cost of maintaining anywhere near as far. The build decision gets made once. The maintenance decision gets made every single day afterward — usually by people who were not in the room for the original build. The build is the deposit. The maintenance is the mortgage. And the mortgage is where companies go broke.
The question stopped being “can we build this?” because in 2026 the answer is almost always yes. The question that still narrows the choice is “do we want to own this for the next ten years?” Leaders getting this right ask three things before they build: does this touch what customers actually pay us for, what is the cost of being wrong, and what is the long-term operational cost of owning it.
And notice what did not happen during the SaaSpocalypse: the death of enterprise software. Enterprise software spend in 2026 is accelerating at one of the highest rates ever recorded — projected to clear well over a trillion dollars and growing double digits year over year. Interface-only moats died. Tools whose only edge was a slightly nicer screen got repriced fast. But systems of record, compliance infrastructure, and accumulated domain logic did not evaporate. The moat moved — to compliance and trust infrastructure (SOC 2, HIPAA, FedRAMP, SOX, ISO 27001), to accumulated proprietary domain data, and to accountability (agent systems that are reviewable, correctable, auditable, explainable, or they fall apart at scale).
An agent without a real system of record to read from and write to is a goldfish with an excellent vocabulary. It can talk beautifully about a contract or a closing entry, but strip away the CRM, the ticketing system, or the ERP, and it has nothing to act on. The build-vs-buy answer EPC Group converges on with clients every week: build to learn, buy to scale. Use AI to build the smallest thing that teaches you what the real problem is — then make the ownership decision deliberately. See our published engagement model for how we structure the decision tree.
There is a development happening in parallel that almost nobody is connecting to any of this — and it deserves the headline. In mid-2026 Perplexity published research on something it calls Search as Code. Instead of treating search as a black box you query and consume, they broke their search stack into atomic composable primitives — retrieval, ranking, filtering, fan-outs, rendering — and exposed them as an SDK. A model writes actual code in a secure sandbox to assemble those primitives into a custom pipeline tailored to the task. For a simple question, a couple of calls. For a hard one, it orchestrates thousands of search operations in a single turn, with conditionals, parallelism, deduplication, and verification baked in.
Vibe slop is what you get when a model writes code from a vague prompt and a tired human ships it unread. Search as Code is what you get when a model writes code against a carefully engineered set of primitives, runs it in a sandboxed runtime, and is held to an explicit verifiable contract. One is improvisation pretending to be engineering. The other is engineering that happens to use a model as the orchestrator.
Their case study pointed the system at a brutal real-world task: identify and characterize more than 200 high-severity security vulnerabilities across three years, citing the affected vendor's own advisory for each, naming the product and the fixed version, and proving the fix is actually tied to that vulnerability. Their approach scored 100% on accuracy while using ~85% fewer tokens than the baseline. Competing systems came in under 25%.
Every move in their architecture is a governance move dressed up as an engineering one. Rules encoded into the query plan — only vendor-owned advisory pages counted, aggregators ruled out at the source rather than filtered later (untrusted by default). Model used as a planning subroutine, not the final authority — the code summarized where coverage was thin, asked for targeted refinements, and validated each proposed query before running it (the model proposes; the code disposes). An explicit verifier with a real schema and a confidence threshold — a page only survived if it bound one vulnerability to one product and one fix version in the vendor's own text. Weak evidence got discarded.
Strip away the search specifics and that is a verification gate, a cross-checking step, a source-traceability requirement, and a human-defined confidence threshold — the exact governance layers EPC Group builds into every multi-AI BI engagement. Three different parties, three different starting points, one conclusion. The villain was never code generation. The villain is code generation without the second half of the system.
Diagnosis without a plan is just complaining with footnotes. The fix is architectural and organizational, not a matter of using the tool a little less.
Same posture you take with a brand-new hire on day one. Every line gets the review, scanning, and peer scrutiny you would demand of human work. "Usually right" is precisely the problem — it lulls you.
Policy + procedure + implementation + testing + training. All five dimensions, live. Policy without implementation visibility manufactures false confidence. Govern with our AI Governance framework.
AI is excellent at the interface layer, glue code, integrations, prototyping. It is dangerous as an unsupervised author of core business logic — rules that decide transactions, exceptions, and compliance. Speed where speed is cheap; governance where mistakes are expensive.
Static analysis during development, dependency scanning of import trees, dynamic testing under real conditions, monitoring of AI-generated API code. Prompting helps, but prompting is a nudge, not a guarantee.
Feature flags + progressive rollouts. They do not stop AI from writing wrong code — nothing fully does. They make wrong code cheap to discover and cheap to undo. ~20% of heavy-AI deployments end in a rollback, hotfix, or incident.
Before "just build it with AI" rewrites your software budget, separate the build cost from the ownership cost. Build the smallest thing that teaches you the real problem, then make the buy-vs-own call on purpose. Favor vendors where compliance, audit posture, and accumulated domain data are the moat.
I do not think we are headed for an apocalypse, and I do not think the warning from those OpenClaw engineers is hype. They are describing the early symptoms of a real, predictable disease, and like most diseases it is far cheaper to prevent than to treat. The gains from AI-assisted development are real. But the gains are concentrated in a single bright moment, and the costs are scattered across everything downstream: review load, security exposure, production incidents, cloud bills, leaked customer data, and a maintenance burden that compounds for years. The score is closer than the demo makes it look, and the second half of the game is where it gets decided.
The slop story, the build-everything story, and the governance story are one story. Shipping unreviewed AI code, rebuilding your whole software stack because building got cheap, and wiring a model into a workflow with no runtime oversight are the same mistake at three altitudes — confusing the cost of creation with the cost of ownership, and confusing confidence with correctness. The fix is the same at every altitude too: control, verification, traceability, reversibility, and a deterministic layer doing the work the model should never be trusted to do alone.
This is exactly why we built EPC Group's multi-AI practice the way we did — multiple engines checking each other, governance baked into the foundation, and a team trained to assume the machine is confidently wrong until it proves otherwise. Multiple models. One truth.
Ship fast if you want. Just make sure you can see what you shipped, undo it when it's wrong, verify it before it lands, prove who owns it, and trust the person who signed off on it. Everything else is vibes.
Vibe slop is a term coined by engineers behind OpenClaw (Mario Zechner and Armin Ronacher) to describe the buggy, hard-to-maintain, and sometimes unsafe code that ships when teams generate software from casual prompts and deploy it without serious review. The phrase collides "vibe coding" (describing software in plain English and letting a model write it) with "AI slop" (low-value machine-generated content). The pattern is automation bias plus review fatigue: humans trust model output more than they should, and teams buried under AI-generated PR volume stop reading them line by line.
The productivity gains are real — telemetry from Cursor, GitHub Copilot, and Claude Code shows heaviest-engagement developers shipping several times more durable work than non-AI peers, and enterprises have reported ~30% faster engineering velocity with agentic workflows. But the same telemetry shows downstream costs rising: pull requests ~51% larger, bugs/PR up ~28%, review time stretched to multiples of baseline, incidents/PR roughly tripled. The honest read is a productivity paradox — throughput and the reliability tax rise together. Teams measuring only the typing think they are winning. Teams measuring the whole lifecycle know the real score.
EPC Group built a multi-AI business intelligence practice on the premise that one model is a liability and a committee of them is an asset. Engagements include explicit governance layers: verification gates that force a model to show its work, cross-model adjudication that pits engines against each other so disagreement is treated as signal not noise, source-traceability requirements so no claim ships without a verifiable origin, and human sign-off thresholds that scale with the blast radius of the decision. The posture is: assume the machine is confidently wrong until it proves otherwise — the same way a good auditor assumes the books are off until they reconcile.
Synthetic confidence is the failure mode where a model is simultaneously extremely capable and confidently wrong — not obviously wrong, but plausibly wrong. Elegant code referencing functions that do not exist. Logic that looks sound while quietly introducing security risk. Compliance mappings that read correctly while subtly misstating what is actually required. The danger is not the error — humans are wrong constantly. The danger is the tone: a model delivers a wrong answer in the same calm, polished register it uses when it is correct. There is no tremor in its voice, no hedge, no tell. Human beings are wired to read confidence as competence, which makes vibe slop genuinely dangerous and not merely annoying.
AI dropped the cost of building to the floor. It did not drop the cost of maintaining anywhere near as far. The build decision is made once. The maintenance decision is made every single day afterward — usually by people who were not in the room for the original build and have less context than the team that shipped it. The build is the deposit. The maintenance is the mortgage. So "can we build this?" almost always returns yes in 2026 and stops narrowing the choice. The question that still narrows it is "do we want to own this for the next ten years?" — and most things you should not want to own. The build-everything crowd is about to inherit at portfolio scale exactly what the vibe-slop crowd inherited at the code level: something they did not differentiate with, did not document, and now cannot stop running.
Six concrete moves: (1) Treat all AI-generated code as untrusted by default — every line gets the review, scanning, and peer scrutiny you would demand of human work. (2) Make governance operational, not theatrical — across five live dimensions: policy, procedure, implementation, testing, training. (3) Separate layers where AI shines (interface, glue code, integrations, prototyping) from layers where it cannot run unsupervised (core business logic, regulatory rules, financial calculations). (4) Build security into the pipeline — static analysis, dependency scanning, dynamic testing, monitoring of AI-generated API code — not into good intentions. (5) Make releases reversible — feature flags and progressive rollouts make wrong code cheap to discover and cheap to undo. (6) Decide ownership deliberately — build to learn, buy to scale, favoring vendors for anything where compliance, audit posture, and accumulated domain data are the moat.
NIST AI RMF, OWASP LLM Top 10, EU AI Act, NIST AI RMF 47-control crosswalk for regulated industries.
Fractional Chief AI Officer engagements: AI strategy, governance authorship, Copilot/Azure OpenAI program standup.
End-to-end Copilot deployment — readiness assessment, oversharing remediation, sensitivity-label coverage, rollout governance.
Fixed-fee accelerator. Audit oversharing, sensitivity labels, DLP gaps, identity hygiene, Conditional Access. Output: go/no-go roadmap.
Enterprise security: Defender for Endpoint + Identity + Office 365 + Cloud Apps + Cloud + Sentinel SIEM.
OpenAI, Anthropic, Google, Perplexity, xAI, Microsoft — provider-agnostic AI architecture for enterprise analytics.
Companion piece: Defender + Intune discovery for OpenClaw, Claude Code, GitHub Copilot CLI on Windows endpoints.
Purview AI Hub, Defender XDR, Entra ID — the regulated-industry agent governance playbook.
Three tiers: fixed-fee accelerators ($25-150K), milestone projects ($150-750K+), managed services (from $3,500/mo).
EPC Group designs governed, multi-AI Microsoft solutions built to be auditable, reversible, and defensible. 29 years Microsoft consulting. Microsoft Solutions Partner. Senior architect on every Statement of Work.
contact@epcgroup.net · (888) 381-9725 · HIPAA · SOC 2 · FedRAMP · CMMC compliant delivery