
Two engineers on the same client project. Both using Claude. Both technically competent. Wildly different output quality.
One was getting responses that fit the codebase: the right patterns, the right constraints respected, output that could go almost directly into a pull request. The other was getting responses that looked plausible but missed the mark on almost every project-specific detail. He was spending more time correcting Claude’s work than it would have taken to write it himself.
Thanks for reading! Subscribe for free to receive new posts and support our work.
We dug into the difference.
Same model. Same general prompting approach. Similar tasks. The gap wasn’t intelligence or skill. The gap was context.
One engineer had developed a habit of priming Claude with project-specific background before asking for anything. The other was prompting cold: task in, output out, no setup.
I’ve watched this pattern repeat across enough client teams now to be confident it’s not accidental. The teams getting consistent, leveraged output from AI aren’t just better at writing prompts. They’remaintaining contextas an operational practice. The ones getting “random brilliance” — great output sometimes, frustrating output other times — usually aren’t doing that work at all.
That’s the gap CRAFT’s Context Graph is designed to close.
When I started mapping how our teams were using Claude across projects, the pattern Skip described was everywhere. Most prompts were cold: a task description, maybe some file contents pasted in, and whatever Claude could infer from that snapshot. No project history. No architectural constraints. No record of what decisions had already been made and why.
The output was often locally plausible: syntactically correct, logically reasonable, consistent with common patterns. But “common patterns” aren’t the same as “this project’s patterns.”
A cold prompt produces the median answer. What you need is the answer that fits your specific context.
The cost shows up in a few predictable ways. The AI suggests approaches that were already evaluated and rejected. It generates code thatconflicts with a constraintdocumented in a Decision Record nobody pasted in. It creates something that works in isolation but breaks an integration that exists in a file it never saw. Each of these requiresa human to catch the problem, understand why it’s wrong, and correct it. The correction loop is where the leverage dies.
When I calculated time-on-correction versus time-on-generation across a few active projects, the ratio was worse than anyone expected. Teams assumed they were saving sixty to seventy percent of writing time.Net of corrections, they were often saving thirty to forty. Context work closes a significant portion of that gap.
The Context Graph is a living document maintained per project andloaded into Claudeat the start of any serious work session.
It’s not a prompt template. It’s not a README. It’s a structured set of project facts organized specifically for AI consumption, kept current as the project evolves.
Seven categories. The structure is fixed. The content evolves with the project.
1. Project identity.The Intent Contract summary: what we’re building, for whom, what success looks like. One paragraph. This is the anchor. Without it, every AI session starts from scratch on the most basic question of what the work is actually for.
2. Architecture and stack.The technical environment in enough detail to constrain suggestions. Framework versions, key libraries, infrastructure choices, database schema at a summary level. Not a full spec — enough to prevent the AI from recommending patterns that aren’t compatible with what exists.
3. Decision Record index.A list of key decisions with their one-sentence summaries and, where relevant, the reasoning. Not the full DRs: the summaries. If a decision constrains what can be built or how, it goes in the index. This is what prevents the AI from suggesting an approach that was evaluated and rejected three weeks ago.
4. Current work context.What we’re working on right now, at the task level. The scope of the current sprint or work item. What’s in flight. This narrows the AI’s frame from “the whole project” to “the thing we’re actually doing this week,” which dramatically improves specificity.
5. Boundaries and risk levels.An explicit map of where AI autonomy is calibrated for this project: what it can do without review, what requires a human in the loop, what should not be touched without explicit approval. This is the CRAFT Automate principle made concrete. The AI doesn’t guess at its own risk profile. We tell it.
6. Known failure modes.What has broken before. What the team has already learned to watch for. Recurring bugs. Integration edge cases. Performance constraints that show up under specific conditions. This is the institutional knowledge that lives in engineers’ heads and disappears when people move on. Putting it in the Context Graph makes it available to both new team members and to Claude.
7. Open questions.Decisions that are still unresolved. Things the team is actively debating. This signals to Claude where it should flag uncertainty rather than fill in the gap with a plausible answer.
An AI that doesn’t know what’s unresolved will resolve open questions on its own. Usually badly.
The consistency shift is what founders and technical leads notice first, but they usually describe it in a different way. The typical comment is something like: “It feels like Claude actually knows our product now.” Or: “I’m not cleaning up every response anymore.”
That’s not Claude knowing the product. That’s the Context Graph doing its job.
The second thing clients notice is onboarding speed. When a new engineer joins a team that’s maintaining a Context Graph, the document becomes part of their onboarding. Not just for understanding the project: for understanding how to work with AI on it. The behavioral norms around context maintenance transfer explicitly rather than getting re-learned by each person.
The third thing, and this one takes longer to see, is that the graph becomes a forcing function for clarity. A team that can’t fill in the “boundaries and risk levels” section probably hasn’t actually decided what those boundaries are. A team that can’t write a current-work-context entry that’s specific enough to be useful probably has scope that’s too fuzzy. The process of maintaining the Context Graph surfaces the same kinds of gaps the Intent Contract surfaces, just at the ongoing work layer rather than the project-start layer.
I want to be straight about the part of this that doesn’t work cleanly yet.
The Context Graph is useful in direct proportion to how current it is.
A stale graph is worse than no graph, because it gives Claude confident wrong context instead of uncertain right context.
Keeping it current requires a maintenance discipline that most teams find harder than they expected.
We’ve tried a few models. Assigning ownership to one person per project produces the most consistent results, but creates a single point of failure when that person is swamped. Shared ownership produces more resilient coverage but lower average currency. We’re currently experimenting with a lightweight end-of-sprint review trigger: the last task of every sprint is a fifteen-minute Context Graph update, with specific prompts for each category.
I’m not going to tell you we have this fully solved. What I can say is that the teams maintaining it consistently are seeing the output quality difference Skip described, and the teams that let it drift are not. The delta is significant enough that we keep investing in making maintenance easier.
Next week, Skip covers the Fortify Gate: the go/no-go checklist that runs before anything ships. If you read our piece on the $50k Prototype Trap from Series 1, the Fortify Gate is the direct answer to the question that article raised.
If this resonated, subscribe. We’re writing about what’s actually changing in software delivery, no hype, no hand-waving, just what we’re seeing on real projects.
Maintain one of these per active project. Load it into Claude at the start of any work session where the AI will be contributing to the codebase, architecture, or product decisions. Update it at the end of each sprint or when any of the seven categories materially change.
Project:[Name]
Last updated:[Date]
Owner:[Person responsible for keeping this current]
1. Project Identity
What we’re building, for whom, and what success looks like. Pull from the Intent Contract. Two to four sentences. This is the anchor — every AI session starts here.
2. Architecture and Stack
Framework, key libraries, infrastructure, database. Specific versions where relevant. Enough to constrain suggestions and prevent incompatible recommendations. Not a full spec.
3. Decision Record Index
Key decisions that constrain what can be built or how. Format: Decision (one sentence) / Reasoning summary (one sentence) / Date.
Example entry:
• JWT over session cookies for API auth / Stateless containers scaling horizontally; server-side session storage not viable / 2026-03-14
4. Current Work Context
What the team is working on right now. Current sprint goal. Active work items. Scope boundaries for this period. Narrow this to what’s relevant for the current session if the sprint scope is broad.
5. Boundaries and Risk Levels
Where AI autonomy is calibrated for this project. Three tiers:
•High autonomy:Tasks where AI output can go directly to review without human pre-check.
•Human in the loop:Tasks where AI drafts but a human reviews before any action.
•Do not touch without explicit approval:Sensitive areas, production paths, auth/security layers, etc.
6. Known Failure Modes
What has broken before. Recurring issues. Integration edge cases. Performance constraints. Things the team has learned to watch for. This is institutional knowledge made explicit.
7. Open Questions
Decisions that are still unresolved. Debates in progress. Areas where the AI should flag uncertainty rather than fill in a gap. Update this when questions resolve.
Paste this before any substantive work session:
Before we begin, here is the current project context. Please treat this as ground truth for all responses in this session. Do not suggest approaches that conflict with the Decision Record Index, the architecture and stack constraints, or the boundary levels defined below. For any open question listed in section 7, flag uncertainty rather than resolving it on your own.
[Paste full Context Graph here]
Ready to begin. Here is the task:
[Task description]
Written by Skip Marshall
Learn more about our team