For the 7th post in the Intelligent Quality Leadership series, I have dived into where having a human still involved in the loop is crucial. These are my thoughts, would love to hear others on this…


AI Writes the Test. But Who Decides What to Test?

A CTO asked me a question recently that’s stayed with me.

“Would you be comfortable with non-automation experts writing test automation through AI?”

He was testing whether I’d get defensive about QE territory, or whether I’d actually thought it through.

My answer was yes. The yes comes with a clear position though.


AI can write the test code. Deciding what gets tested, and why, is still a human call.

The business context, knowing which feature change carries the most customer risk, knowing that the checkout flow matters more than the account settings page, that feeds into every good testing decision. AI doesn’t have that. You have to give it to them. And “giving context to AI” isn’t the same as owning the strategy. Someone still has to understand the product well enough to know what to point the AI at.


The Coverage Trap

There’s a failure mode I’ve seen play out in real organisations.

Coverage goes up 25%. Incidents double. The metric looks healthy. The product doesn’t feel it.

Nobody stopped to ask what the new tests were covering, or whether they connected to anything a customer would actually notice. Coverage without that understanding is a vanity metric — a number that creates the feeling of safety without the substance.

Wait. Let me be more precise: it’s not a feeling of safety. It’s the appearance of safety, in a dashboard, to people who haven’t looked closely enough.

Guardrails help. But zero human oversight, even with stringent guardrails, is a risk I’m not comfortable taking right now. The governance frameworks aren’t mature enough. The judgment of what matters, what risk is acceptable, what coverage actually means for this product, can’t be fully codified into a prompt yet.


The Conflict of Interest

There’s another problem that doesn’t get enough airtime.

If you’re using an LLM to help build a feature, and the same LLM to test it, you have a conflict of interest. The blind spots in the output are exactly where the blind spots in the testing will be. It’ll miss what it missed the first time.

It’s the same principle as a developer testing and being the only one to review their own code. You’d never accept that as a quality practice. The same logic applies here.

For AI-powered features it gets more serious. When the output is non-deterministic, someone has to set the thresholds. What range of responses is acceptable? What constitutes a regression when the answer changes? That someone needs to be independent of the system that produced the output.

Human judgment. At the validation step. Independent of the AI that generated the thing being tested.


The Shift

Here’s where I land on the other side of this, and it’s actually good news for the profession.

A QE engineer who couldn’t write automation before can now point an AI at a user journey and get working test code. That’s progress.

The skill was always knowing what to test and whether what came out is worth keeping. AI removed the barrier between having that skill and being able to act on it. Less experienced engineers can now do work that used to need years of automation behind it.

The education piece shifts with that. Teaching people how to write a loop matters less than teaching them how to think about what the loop should test. How to set up the right approach. How to judge whether the output is actually useful. That’s a better conversation to be having with your team.


So Where Does the Human Stay?

In the judgment calls. In the decisions about what matters to the customer. In the independence layer between generation and validation. In the strategy that decides what good coverage actually means for this product.

The loop still needs a human. What changes is where they sit in it.

From execution to judgment. From writing tests to owning why they exist.


Where’s the human staying in your team’s process right now, and what made you draw the line there?

Leave a comment