Written by: Aishwarya Ashok
Editorial Contributions: Benjamin Paul Rode, Garima Gujral, Joaquin Melara, Stratos Kontopoulos, and Sruthi Radhakrishnan
Abstract
Vibe coding has grown in immense popularity by lowering the barrier to entry and accelerating prototyping. However, despite this advancement, product teams are experiencing a distinct wave of disappointment as their launched AI features quickly degrade into "AI slop", creating outputs that feel generic and detached from actual user workflows. This phenomenon is not a consequence of model limitations, rather it is a context problem driven by isolating the vibe coding tool from subject matter expertise and domain-specific realities. When teams withhold domain realities, user constraints, and quality standards, the vibe coding tool defaults to the remarkably average results stemming from its training data. This issue manifests as superficial chat interfaces that fail to meet the functional and system-level needs of the team.
To move past this prototyping phase and build compounding value, the bottleneck of product development must shift from writing code to defining what "good" looks like. Teams must transition from prompting (a temporary request) to onboarding (a reusable system), treating AI agents like capable new colleagues by systematically codifying the team's context, constraints, and taste. Ultimately, the role of the modern product builder is to orient and align agent productivity by creating the structure required for it to be able to reason within the shared expectations of the team, product, and organization.

The lure of accessibility and ease
Every product team I've been chatting with over the last six months seems to carry some version of the same quiet complaint: “We shipped the AI feature, the demo was great, and within weeks of launch, the output started feeling generic. The agent did not get worse. The context did not get better.”
It is a strange shape of disappointment. The cost of building has dropped so far that almost any team can now spin up a draft interface, generate code for a workflow, summarize a research file, or stitch an internal copilot together over a weekend.
The immediate temptation is to ask: What can we build now that building is cheaper?
But, as product people, we may be asking the easier question.
The more useful one is: What judgment must be preserved when building becomes cheap?
Cheap building increases the volume of output. It does not, on its own, increase the quality of the product underneath. The bottleneck has moved, from typing the syntax to deciding what should exist, what should not, and what 'good' looks like inside a particular team and a particular domain. The core objective involved in software engineering has always been to optimize for context while adhering to best practices, including: accurate and timely documentation, factoring to ensure compositionality, controlling for computational cost, respecting IP and other legal constraints, among quality-assurance related considerations.
These higher-level decisions are not what agents were made for. These decisions are uniquely what a product leader is responsible for. The difference between creating AI slop and generating compounding value with AI is whether you have provided agents with the structure to conform to such requirements while managing the scaling complexities that come with codebase growth. That is the entire piece in one line.

Vibe coding is a phase, not the destination
Vibe coding deserves more credit than people usually give. I find myself defending the phase often, even as I argue it is not the destination. Vibe coding got more teams to a working prototype than any product methodology of the last decade. It moved operators, founders, and PMs past the blank-page fear. It taught a lot of builders to type something into a vibe coding tool like Cursor or Claude Code and see a thing appear by lunch.
That matters a lot!
At the same time, vibe coding has also created a false signal. Because something can be generated quickly, it can feel closer to being a product than it actually is. While agents and vibe coding tools can get a team to a working prototype much faster than humans could manage alone, what is produced up to that point has the character of a reasonable but unreliable architecture. Humans must vet and refine the result which means that they also need to understand the code that the machine has produced.
“A draft is not a deliverable. A demo is not a workflow. The plausible thing is not a useful thing.”
Most of what gets called 'AI slop' shows up between a thing that can be generated and a thing that can be used:
A prototype that looks complete but ignores the user's actual workflow.
A dashboard that visualizes metrics no decision-maker uses.
An automation that saves five minutes upstream and adds review risk downstream.
A chatbot bolted onto a product without changing the underlying service experience.
Each of these is a vibe-coded answer to a question that needed product thinking instead. And, in enterprise settings, the gap is sharper still. In enterprise AI, the agent is rarely failing because it cannot write. It is failing because it has not been taught the operating reality — the workflow, the role permissions, the data limits, the compliance edge, and the tradeoffs that the team has made and unmade over years.
So, the shift the moment is asking for is not a trade-off between tools. The shift is really an elevation from vibe coding to product building. From asking an agent to make something plausible, to providing the right structure for it to produce useful, usable, and contextually correct outputs within the requirements of specific use environments.
The surface gets better before the judgment does. The product builder's job is not to produce more surface area. It is to preserve purpose.

Slop is what happens when expertise stays tacit
It is tempting to read 'slop' as a model problem: newer model, better output, problem solved. And, most of the conversations I've been chatting through do exactly that. But, the pattern is older than that, and more stubborn. Slop happens when an agent is asked to perform without the context a competent teammate would receive.
A new product manager joining a team would not be asked to write a launch strategy on day one with only the product name. A designer would not be expected to redesign a workflow without seeing customer behavior, support tickets, constraints, and examples of what the team considers good. An analyst would not be trusted with a business recommendation without knowing the metric definitions, decision history, and failure modes of the data.
Yet with agents, we often do exactly that. We ask for output while withholding the audience, the standard, the constraints, the examples, the rejection criteria, and the tradeoffs already considered.
Slop is what happens when expertise stays tacit. That is not collaboration. That is repeated orientation. If the agent does not know how you think, it will default to how the average of the internet thinks. That is a recoverable problem, but only if the team treats it as a context problem rather than a model problem. Better prompting will not solve it. A bigger model will not solve it. What will is writing the team's expertise down in a form the agent can actually execute while continuously benchmarking team performance against the tools you adopt.
Consider the following 3 cases.
Case 1: The single-prompt operator.
Someone copies a prompt from a thread, runs it once, gets generic output, and concludes the model is not useful. The model is fine. The agent was simply never given the domain, the user, the workflow, or the decision context. It produced a generic answer because the question was generic.
Case 2: The 'add AI' product team.
A team bolts a chat interface onto an existing workflow and calls it progress. Usage spikes at launch, then flattens, because the agent was placed next to the job-to-be-done rather than embedded inside it. The team rebuilt the surface and left the system alone.
Case 3: The over-contextualized but under-systematized team.
This one is quieter and more common than people think. A team writes long, thoughtful prompts every morning, explaining the same product, the same user, the same constraints, and the same examples to a different session each time. They are working hard. They are not compounding. Every conversation starts from scratch.
Three different teams. Three different situations. The same diagnosis.

Onboard and orient, do not summon
Here is the analogy I keep coming back to in conversations with builders. When a capable new colleague joins a product team, no one asks them for the strategy on day one. We onboard them. We explain the customer, the business model, the product surface, the constraints, the quality bar, the politics, the edge cases, and what's already been tried. We send them docs. We pair them on a few real tasks. We tell them where the landmines are.
After a few weeks, they start to make decisions that look like the decisions the rest of the team would have made — and gradually, decisions the rest of the team would not have thought of, because the new colleague brings their own judgment laid on top of the operating context.
I've been on both sides of this: onboarding new hires well, and watching teams skip it. The teams that skip it always notice the same thing: the new person produces output that feels right at first glance and wrong on the second.
Plausible, not contextual. A familiar problem.
Agents need the same operating environment, for the same reasons:
Context: What environment is this work happening in?
Constraints: What cannot be violated?
Taste: What is 'good' defined as based on the experience and judgement of the team?
Every team has these three. Most teams have never written them down. With a human teammate, the gap is patched by conversations, reviews, and time. With an agent, the gap shows up as slop every time.
So, why are we asking less of an agent than we ask of a teammate?
The shift the moment is asking for is not from one prompt style to another. It is from prompting to onboarding. Prompting is a request. Onboarding is a system. And, ‘Skills’ are how systems become reusable. But we’ll get more into that in the next article, “The Skill Stack: Turning Judgment into Reusable AI Infrastructure.”




