How to Evaluate a Dev Shop in 2026 (A Non-Technical Founder's Guide)

Skip MarshallMarch 3, 20269 min read

I'm going to tell you something your dev shop won't: most of them are selling you a process that was designed for a world that doesn't exist anymore.

That's not necessarily malicious. A lot of agencies and consultancies are staffed by good engineers who genuinely care about the work. But the industry is in the middle of a structural shift, and the difference between a shop that's adapted and one that hasn't is the difference between getting your product to market in three months and getting an invoice for twelve months of \"progress\" that looks impressive in a slide deck and doesn't work in production.

I write this as someone who runs a company that builds software products for clients. I have skin in this game. But I've also spent twenty years watching non-technical founders get burned by relationships that didn't need to go wrong — and in most cases, the founder had no way of knowing what to look for. The information asymmetry between a technical team and a non-technical buyer is enormous, and it almost always favors the seller.

This piece is my attempt to tilt that balance. These are the questions I'd want a friend to ask before signing a dev contract. None of them require a technical background. All of them will tell you more about a shop's capability than their portfolio page ever will.

Thanks for reading! Subscribe for free to receive new posts and support our work.

Before we start: the landscape shifted

If you're evaluating dev shops the way you would have in 2023, you're going to make bad decisions. Here's what changed:

AI made code generation fast and cheap. A good engineer with modern AI tools produces more working software in a day than a team of three produced in a week just a few years ago. This means thecostof building something dropped significantly — but the cost of building therightthing didn't change at all. The planning, the decision-making, the architecture choices, the coordination, the quality assurance, the infrastructure — all of that is still human work, and it still takes the same time and expertise it always did.

What this means for you: any shop that's still pricing primarily on \"how many developers for how many months\" is selling you the old model.

Code production is no longer the scarce resource. Judgment is.

You want to pay for judgment — good decisions about what to build, how to build it, and how to know if it's working. If a proposal reads like a staffing plan instead of a strategy, that's a red flag.

The five questions

1. \"Can you tell me what success looks like for this product — not what you'll build, but how we'll know it worked?\"

This is the most important question you can ask, and the answer will tell you everything.

A shop that's oriented around outcomes will have a crisp answer: \"Success means your users can complete the onboarding flow in under three minutes with less than a 10% drop-off rate\" or \"Success means your operations team processes claims 40% faster than the current workflow.\" They'll push back on your feature list and askwhyyou want each feature. They'll want to understand the business goal behind the product, not just the spec.

A shop that's oriented around output will say something like: \"We'll deliver the features in the scope document on time and on budget.\" That sounds reassuring. It isn't. Because delivering features isn't the same as delivering value. I've seen plenty of products that were built exactly to spec — every feature, every screen, every integration — andfailed completely because the spec was wrong.Nobody asked whether the thing they were building would actually solve the problem.

The best shops define success before writing a line of code. They'll spend the first week or two helping you articulate what you're trying to achieve, for whom, and how you'll measure it. If a shop wants to start coding in week one, ask yourself: how do they know what to build if they haven't defined what success looks like?

2. \"When you make a major technical decision, how do I find out about it — and how do I find out why?\"

Technical decisions are being made on your product constantly. Which database to use. How to structure the data model. Whether to build a feature from scratch or use a third-party service. How to handle authentication. What tradeoffs to make between speed and flexibility.

You don't need to understand these decisions technically. But you need to know they're being made deliberately, and you need access to the reasoning.

A good shop will have a visible trail of decisions: what they chose, what they considered, and why they went the direction they did. Not a hundred-page architecture document — a simple running log where you can see the reasoning behind technical choices. You should be able to look at it anytime and understand the shape of the decisions being made on your product, even if you don't understand the technical details.

A shop that can't show you this is making decisions by vibes. That works when everything goes well. It falls apart the moment something doesn't — because when a problem surfaces three months later, nobody can rememberwhythings were built a certain way, which means they can't fix it without risking breaking something else.

Ask to see a decision record from a past engagement — with the client details removed, obviously. If they have one, you'll see clear thinking. If they don't have one, that tells you something too.

3. \"What's your rollback plan if something goes wrong after launch?\"

This question makes bad shops uncomfortable and good shops light up.

Every release carries risk. A feature might break in an environment the team didn't test against. A change might interact badly with something else in the system. A third-party integration might behave differently in production than it did in staging. These aren't hypotheticals — they're the reality of shipping software.

A good shop will tell you, before each release, what they're shipping, what could go wrong, and how they'll undo it if it does. They'll have defined \"safe to ship\" criteria that match the risk of the change — a minor copy update doesn't need the same process as a new payment flow. They'll have monitoring in place so they know something broke before your users tell you.

A shop that says \"we test thoroughly, so rollbacks aren't really an issue\" is telling you they don't have a rollback plan. Testing reduces risk. It doesn't eliminate it.

The question isn't whether something will ever go wrong. It's whether the team has a plan for when it does.

4. \"After you ship something, how do you know whether it actually worked?\"

This is where you separate the builders from the builders-who-think.

Most shops ship a feature and move on to the next ticket. The definition of \"done\" is deployment. But deployment isn't the finish line — it's the starting line. A feature that's live but unused, or live but confusing, or live but driving the wrong behavior, isn't done. It's a guess that hasn't been validated.

The question you're asking is: does this team measure outcomes, or just output?

A good shop will have a plan, before they build a feature, for how they'll know it worked. Maybe it's user analytics. Maybe it's a specific business metric. Maybe it's a conversation with five users after launch. The method matters less than the intent: they build something, they ship it, and then theycheck. And what they learn from checking shapes what they build next.

If a shop's reporting to you is limited to \"we completed 47 story points this sprint\" or \"we shipped 12 features this quarter,\" you're measuring the wrong thing. Story points and feature countstell you the team was busy. They don't tell you the product got better.

Ask: \"What did you learn from the last thing you shipped on another engagement?\" A shop that can answer that question thoughtfully is a shop that actuallycloses the loop.

5. \"Can I talk to someone on the team who will disagree with me?\"

This one sounds strange. It's the most important cultural signal you can test for.

A healthy dev shop will have people who push back on your ideas — respectfully, constructively, but firmly. \"I know you want a dashboard with twenty charts, but our users aren't going to look at twenty charts. Let's start with the three that answer their actual questions.\" That pushback is worth thousands of dollars. It's the thing that prevents you from building a product that does everything and helps nobody.

A shop where everyone says yes to everything is a shop that will build exactly what you ask for, whether or not it's the right thing. That might feel good in the planning phase. It feels terrible in the launch phase, when you discover that the feature you insisted on is the one your users ignore.

The best client-vendor relationships I've been part of had real creative tension. The founder brought business context and customer insight. The dev team brought technical judgment and product experience. They argued — productively — about the right path forward. And the product was better for it.

If everyone in the room is nodding, nobody is thinking.

Red flags to watch for

I've seen these enough times that they deserve their own section:

Proposals that lead with team size and hourly rates.This tells you the shop is selling labor, not outcomes.

In 2026, with AI-assisted development, the number of developers on your product matters far less than the quality of decisions those developers make.

A two-person team with excellent judgment will outship a six-person team without it, every time.

\"We follow Agile/Scrum.\"This used to be a good sign.Now it's a yellow flag. Not because Agile principles are wrong — they're not — but because \"we follow Scrum\" has become something people say instead of thinking. It tells you the shop has a process. It doesn't tell you whether that process has been adapted for a world where AI writes most of the code. Ask follow-up questions: how have your sprints changed now that AI handles code generation? If the answer is \"they haven't,\" that's your signal.

No mention of what happens after launch.A shop that's only talking about building and not about measuring, learning, and iterating is a shop that finishes when the code ships. You need a partner that stays engaged through the part that matters: finding out whether the thing they built actually works.

They can't explain their approach without jargon.If a shop can't tell you — in plain language — how they work, what they prioritize, and why, they either don't have a clear approach or they're hiding behind complexity. Either way, you'll spend the engagement confused about what's happening and why. If they can't explain it simply, that's a red flag. You don't need to understand the code, but you need to understand the thinking.

They don't ask you hard questions.If the sales process feels easy and agreeable, be suspicious. A good shop is evaluating you as a client just as much as you're evaluating them as a vendor. They should be asking about your decision-making process, your timeline flexibility, your willingness to participate actively, and your appetite for honest feedback. If they take the engagement without understanding how you operate, they'll build something in a vacuum — and vacuums produce products that don't fit.

What good looks like

I'll keep this simple becausegood isn't complicated. It's just rare.

A good dev shop in 2026 will spend the first engagement period helping you define what you're trying to achieve before they write any code. They'll document their decisions so you can follow the reasoning even if you can't follow the code. They'll define what \"safe to ship\" means before each release. They'll measure whether shipped features actually worked. And they'll tell you things you don't want to hear — kindly, but clearly — because they'd rather build the right product than the easy one.

If that sounds like a high bar, it is. But it's your money, your product, and your users. You deserve a team that treats those things with the seriousness they warrant.

We've spent time diagnosing what's breaking in software delivery — the one-person myth, collapsing timelines, Scrum under pressure, orchestration as the new bottleneck, team structure, and vendor evaluation. Starting next week, we're shifting from diagnosis to patterns. What do the teams that are actually winning despite all of this have in common? It's structural, it's counterintuitive, and we've been hinting at it all along.

If this resonated, subscribe. We're writing about what's actually changing in software delivery — no hype, no hand-waving, just what we're seeing on real products with real teams.

Written by Skip Marshall

Learn more about our team

More Insights

← Back to InsightsSubstack