Measure the outcome

Methodology

Measuring Success in AI Projects

Your AI project shipped on time, under budget, with all features delivered. The team celebrated. Leadership thanked everyone. Six weeks later, nobody can say whether it actually worked.

This is the state of most software and AI implementations today. Teams measure activity: velocity, story points, tickets closed, lines of code written. What they don't measure is whether the project achieved its intended business outcome.

All Methodology

Measurement / Proof

How to use this

Methodology pages explain the standards we use to reduce ambiguity before and during the build.

Align stakeholders around intent
Expose tradeoffs before implementation
Measure whether the system improved the work

The gap between activity and outcome is where most AI investments go to die.

The Problem: We Measure the Wrong Things

The standard framework for project success is broken. It was built for infrastructure projects where activity and outcome are tightly coupled. Build a bridge, cars cross it. Pour concrete, it hardens. These projects have a direct cause-and-effect relationship between work completed and real-world result.

Software, especially AI, doesn't work that way.

You can ship a perfectly executed machine learning model, hit every sprint goal, and still fail to move the needle on what actually matters to the business. A recommendation engine can be beautifully engineered and never used. An automation can work flawlessly while delivering zero cost savings because the process it automated was already optimized. A generative AI system can produce grammatically perfect outputs while missing the insight humans need.

The mismatch between shipping and impact is more severe in AI projects because the failure modes are invisible. A traditional software bug crashes the system and gets caught immediately. An AI failure mode is subtler: it's low adoption because the automation doesn't quite trust the model, or marginal ROI because the metric wasn't defined before the build started, or realized value that's 60% of the projected benefit because the implementation lacked feedback loops to iterate.

Most organizations feel this. Seventy-eight percent of organizations now use AI in at least one business function, but according to McKinsey's 2025 State of AI report, 80% report no clear bottom-line effect yet. This gap between adoption and value isn't a technology problem. It's a measurement problem. You can't improve what you don't measure.

What Success Actually Means: Outcomes, Not Activity

Success in a software or AI project means one thing: the business outcome you set out to achieve is measurable and real.

Not hoped for. Not projected. Measured.

This is different from how most teams think about success. Consider how a typical project defines its scope:

"We'll build an AI-powered customer support chatbot. It will handle initial inquiry triage, reduce first-response time, and scale to handle 500 conversations daily without human intervention."

Those sound like outcomes, but they're not. They're features. They're what the system will do, not what the business will experience as a result.

Now consider the same project defined through outcome metrics:

"We'll reduce support ticket volume that requires human review by 25%. This means moving from 80 human-reviewed tickets daily to 60. We'll measure this starting from the first day of launch, tracked in a Telemetry Ledger, and review results at 30 days post-launch."

Notice the differences. The second version specifies what matters: a percentage reduction in work that needs human hands. It's time-bound. It's measurable from the start. It has a specific review cadence. It leaves zero room for ambiguity about what success looks like.

This is the foundation of how InTech thinks about success.

How InTech Defines Success: The Intent Contract

Before a single line of code is written, InTech delivers an Intent Contract with the client. This is the document that defines business intent in measurable terms.

The Intent Contract contains five core elements:

The business problem being solved (the context for why this project exists at all). The primary outcome metric (the specific, measurable signal that will demonstrate success). The target value for that metric (the concrete number). The measurement window (when success will be assessed). The tolerance for deviation (how much variance from the target is acceptable, and what that reveals).

The Primary Outcome Metric is the most important element. It is a single, observable signal that the business has agreed represents success.

Examples of strong primary outcome metrics:

"Time from customer inquiry to first human-reviewed response drops from 6 hours to under 2 hours." "Manual data entry in the accounts payable process drops from 40 hours weekly to 12 hours weekly." "Support customers requiring escalation to senior staff drops from 35% to below 15%." "Customer onboarding completion rate for self-service flows increases from 62% to 78%."

All of these are specific enough to measure, tied to actual business operations, and binary in a practical sense: either the outcome is achieved or it isn't.

Many projects skip this step entirely. They jump to building. InTech doesn't. The Intent Contract forces clarity before development begins. It ensures the client and the engineering team agree on what success looks like before money is spent and time is invested. This single document eliminates months of misalignment later.

How InTech Tracks Outcomes: Telemetry and the Telemetry Ledger

Telemetry is the "T" in CRAFT, InTech's delivery methodology. It is the feedback loop after shipping that measures whether the intended outcome was achieved.

Many teams treat telemetry as an afterthought. Shipping is Phase 1. Measurement is Phase 2, starting weeks or months after launch. By then, adoption patterns are set, usage is either happening or not, and the opportunity to iterate based on real data is already missed.

InTech does the opposite. Telemetry is baked in during the Fortify phase, which means the measurement system is live in production from the first hour of launch. This serves two purposes: it captures real data from the moment customers interact with the new system, and it enables rapid iteration if the early signals don't match expectations.

The Telemetry Ledger is the document that captures this data over time. Each entry in the ledger corresponds to a measurement window, typically daily or weekly, and records four pieces of information:

The actual signal observed (the measured value of the primary outcome metric). The target value (what was projected). The deviation from target (the difference between expected and actual, and whether it's positive or negative). The action taken in response (if the metric is trending away from target, what did the team do to course-correct).

A sample Telemetry Ledger entry might look like this:

"Week 1 post-launch, Primary Outcome Metric: manual check-in calls per day. Target: 40% reduction (from 80 daily to 48 daily). Actual: 52 calls per day (35% reduction). Deviation: 4 calls above target. Action: chatbot confidence threshold is still too conservative. Lowering threshold by 5% to increase automation attempts. Expect this to close the gap in Week 2."

This entry is valuable because it's honest about reality. The metric moved 35% of the way to target, which is progress but not quite there yet. Rather than declaring the project a success or a failure, it frames the result as actionable data and specifies what the team will do to improve.

This is how outcomes are measured continuously, not retrospectively.

The Post-Launch Review

Thirty days after launch, InTech schedules a post-launch review with the client. This meeting is the final checkpoint for the engagement.

The post-launch review is not a celebration meeting. It is a data review meeting. The Telemetry Ledger is presented in full. The primary outcome metric is examined in its entirety. The team and client discuss together: did we achieve the intended outcome? If yes, what does the next iteration look like? If no, why did we miss, and what can we learn?

This is where the intent contract framework shows its value. Because success was defined clearly at the start, the 30-day review is not a debate. It's a conversation grounded in observable facts. Either the metric moved or it didn't. Either we hit our target or we missed. The discussion that follows is about why and what's next.

Crucially, achieving the target is not the only successful outcome. Missing the target but learning why, and having a clear path forward, is also success. Many projects fail because they generate data but nobody acts on it. The post-launch review ensures that action is committed before the engagement closes.

Why This Matters for AI Projects Specifically

AI projects amplify the measurement problem because they collapse the usual guardrails between intention and outcome.

A traditional software project has built-in checkpoints. Users encounter a bug, and the system breaks. The failure is immediate and undeniable. An AI system, by contrast, can be wrong in ways that are subtle and spread out over time. A language model can sound coherent while missing the insight the user actually needed. A classifier can achieve 92% accuracy on a test set while systematically failing on the live data distribution. An automation can work perfectly while delivering no actual value because the process it automated was never the real bottleneck.

These failure modes aren't visible unless you're measuring. And most AI teams aren't measuring in a systematic way because AI implementations typically lack the rigor that traditional software projects bring to requirements.

The Intent Contract and Telemetry framework close this gap. By forcing the AI team to specify the exact measurable outcome before building, and then tracking that outcome systematically after launch, the invisible failures become visible. The team has data about whether the AI implementation is actually solving the problem it was meant to solve.

For organizations trying to close the AI adoption-to-value gap, this discipline is the difference between an AI project that ships and an AI project that works.

Key Takeaways

Success in software and AI projects is measured by outcomes, not activity. Define your primary outcome metric in the Intent Contract before build begins. Make telemetry live from the first day of launch, not an afterthought. Review your results with your team and client 30 days after launch using concrete data, not gut feel. Iterate based on what the telemetry shows, not what you hoped would happen.

The difference between an executed project and a successful project is measurement discipline. Most teams have the first. Few have the second.

Frequently Asked Questions

Q: What if we don't hit the target in the 30-day window?

A: That's not a failure. It's data. The post-launch review examines why the metric moved the way it did, whether external factors affected the outcome, and what iteration might move the needle closer to target. Many projects require adjustment after launch as you learn how customers actually use the system versus how you expected them to. The goal is not to hit the target on Day 30; the goal is to have enough data by Day 30 to know what to build next.

Q: How do you define a primary outcome metric if the project is exploratory?

A: Even exploratory projects need a metric. Instead of "we'll reduce cost by X%," the metric might be "we'll validate that this automation approach is viable by measuring confidence scores on the model's predictions" or "we'll prove that users will adopt this feature by reaching 40% daily active user rate among the pilot group." The metric is still observable and specific, even if it's measuring viability rather than production impact.

Q: What if the primary outcome metric doesn't move after launch, but the system is working correctly?

A: This is a critical discovery. It usually means one of three things: the metric was defined wrong, the system is working but solving the wrong problem, or there's a gap between technical success and business adoption. The post-launch review gives you data to distinguish between these cases. That clarity is the whole point. Better to discover misalignment at 30 days than at 90 days.

Q: How detailed does the Telemetry Ledger need to be?

A: As detailed as it needs to be to understand why the metric moved the way it did. At minimum: the observed value, the target, the deviation, and the action taken. If you need more granularity (e.g., daily instead of weekly measurements, or sub-metrics that feed into the primary outcome metric), add it. The ledger is a working document meant to inform decisions, not a compliance checkbox.

Q: Can we have multiple primary outcome metrics?

A: Not in the Intent Contract. The Intent Contract specifies one primary outcome metric because success is binary: either you achieved the outcome you committed to or you didn't. Teams often want to track secondary metrics too (adoption rate, time saved, cost per transaction), and you should. But the primary metric is singular, unambiguous, and non-negotiable.

Q: How does Telemetry work with AI models that are constantly evolving?

A: The same way it works with any system. The primary outcome metric measures what the end user experiences, not the model's internal performance. If you're measuring "support tickets requiring human escalation," the metric doesn't care whether the AI is using v1 or v2 of your model. It only cares about the business result. This is why defining metrics at the business level, not the technical level, is so important.

Related Methodology

Keep sharpening the approach

What Is the CRAFT Methodology?

Learn how CRAFT methodology governs AI-assisted product development with clarity before code. A delivery system that prevents faster waste.