Using AI to help teams shift testing left
More and more teams are being asked to leverage Artificial Intelligence and may even have quotas of AI usage to fill. By using the right prompts we can review stories and shift testing left, supporting testability and increasing team velocity.
I won’t be deep diving what shifting left is at a low level in this blog. For more details about that check out Video: Push testing left by testing User Stories and Using Triforce to define Acceptance criteria.
What makes this a good candidate for AI?
When we’re looking to add AI into our workflow we need to be considering what makes this a good opportunity for us to investigate and adopt.
It’s easy
Working with user stories and reviewing them has a lower bar to entry; it doesn’t need much in the way of code or an IDE or deep software development knowledge. Basically we’re getting the tool to read something human readable (a user story) and ask some simple questions (is this testable and understandable). This makes it a good starting point for people to get into using an AI and learn how to construct a prompt regardless of their coding skill level.
It’s safe
We’ll just be reviewing something and commenting on it; we won’t be using AI to write code, directly change the product or change the user story. Our use of AI will not directly hurt our product or impact its quality, meaning we have more human in the loop and control over what we’re building. This makes a good starting point for people to build trust with AI and learn how to craft prompts before doing anything more dangerous or scary.
It’s needed work
Shift left is something that is useful to teams, improving both quality and velocity of engineering. In an environment where people may not have done much shift left work, don’t have a coach or might not have a full time quality expert, we can use an AI prompt to fill any knowledge gap and still be able to shift left.
We can measure its impact
The outcomes of shifting left is measurable: has user story writing improved / has using this prompt changed our user stories? These are things we can track to see the benefit and value of using AI in our organisation (something that managers, C-Suites and company boards love to see).
It’s selling quality engineering
Testing and quality can sometimes feel left out of the discussions when it comes to modern engineering practices. Other engineering disciplines may have a view of testing being untechnical or not modern enough to think about. Using AI to help solve a team’s testing problem shows technical skills and how quality engineering keeps up to date with modern trends and technologies.
Crafting a prompt
We want to create a prompt that wears the quality hat in a Three Amigos / Triforce / Story Shaping session. It needs to:
- review the user story for what makes a user story’s Acceptance Criteria testable
- Provide a view of how testable the story is.
- Make recommendations and ask questions leading to improvements.
This should be then shared to the team in as standard of a format as we can get (ideally in a way that we can get into our task management tool like Jira).
My prompt
<user story>
INSERT USER STORY HERE
</user story>
You are a senior quality engineer at an engineering start up with strong requirements analysis skills. Review the testability of the user story, including acceptance criteria, based on the testability criteria.
<testability criteria>
- User stories should include acceptance criteria.
- Each acceptance criteria should be independent, asking for one thing only, so should not include lists or joined sentences.
- Each acceptance criteria should be specific and measurable, defining a clear behaviour with an outcome.
- Acceptance criteria should cover negative cases, which should include negative terms like not, error, failure, limited, fail, cannot.
- Acceptance criteria should cover edge cases, referencing boundaries, alternative user flows and data variations that will result in behavioural changes.
- Acceptance criteria should cover non-functional requirements, specifically performance, security, usability and accessibility.
- Each acceptance criteria should be clearly written in plain and concise language such as the Monzo tone of voice or meeting a score of 8 on the Flesch-Kincaid readability scale.
</testability criteria>
Based on this review, provide a testability score for the user story based on the score criteria.
<score criteria>
- Fail score – the user story has no content including no defined acceptance criteria.
- Bronze score – the user story has any acceptance criteria, meeting testability criteria 1.
- Silver score – the user story has acceptance criteria that includes negative and edge cases, meeting testability criteria 1,2,3,4 and 5.
- Gold score – the user story meets all 7 of the testability criteria.
</score criteria>
Based on this review, provide a report of the testability score for the user story, how the user story meets the testability criteria and provide suggestions for improvements.
This report should match the format of the report example and be refined to be suitable to share as a comment on a Jira user story.
Include clear sections for: testability score, details of how specific acceptance criteria do not meet testability criteria and clarifying questions. The section for how specific acceptance criteria do not meet testability criteria should be written at the AC level to make it clear how it has failed specifically.
Do not write the report in full paragraphs, instead use short bullet points to make the content easier to read quickly by separating out the different points.
Where acceptance criteria exist, treat each bullet-style line under an “Acceptance Criteria” header as an individual acceptance criterion (e.g., AC1 = “Users can create a new transaction…”, AC2 = “Transactions should display…”, etc.)
<report example>
Testability Score: Bronze
- Acceptance criteria are present, but several do not meet core testability standards.
- Criteria lack specificity, measurability, negative cases, edge cases, and non-functional requirements.
Acceptance Criteria That Do Not Meet Testability Standards
- AC2: Combines multiple requirements. Language is subjective and not measurable.
- AC3: “Quick” is subjective.
- All ACs: No negative scenarios (failure to save, missing data, invalid inputs, idata retrieval errors).
No edge cases (size limits, unsupported data types, device variations).
No non-functional requirements (performance targets, accessibility, security requirements).
Clarifying Questions
- What is the minimum and maximum number of transactions?
- What specifically defines “quick” (e.g., how do we measure that)?
- What happens when saving a transaction fails or if data is missing?
- What customisation options must be supported (colours, layout, text, button behaviour)?
- Are there required accessibility or performance standards (e.g., WCAG, page load times)?
- Who can access a transaction, and should access be secured via tokenised or authenticated links?
</report example>
Disagree with the user story if you believe that it is not written in a testable way. Ask clarifying questions where needed to ensure the criteria are met.
Create the report without the need to gain answers for clarifying questions and instead ask those questions as a part of the report.
I treated crafting this prompt as if I were training a new quality expert into my team:
- Set out agent that gives it the context of a quality engineer (gave them pre-training).
- Provided clear guidelines for what makes an Acceptance Criteria testable.
- Created and shared guidance for criteria scoring in my team.
- Provided an example report template to follow.
- Told it that it’s okay if it disagrees with me.
The aim of my prompt is provide as repeatable of an outcome as possible, so that we can use it meaningfully, which is why I’ve provided guidelines and examples for it to follow. You might also notice that it’s a pretty long prompt, which is necessary in order to get the behaviour I wanted; I actually ended up crafting the prompt on a separate Google doc rather than in the AI engine itself and then running it to test it out. Something I’ve learned about using AI is that you have to talk to it in a way that’s really clear and verbose with how you ask for things (even asking for the same things multiple times in multiple ways).
Measuring the impact
I’m trying to be outcomes driven when using this AI prompt, to ensure that we’re using AI in a meaningful and helpful way. Doing this means answering some key questions:
Is the prompt being used?
Adding the prompt to a Jira workflow allows people to pull the prompt on demand, this means we’re able to pull numbers of usages across the business to track adoption. We could ask people and teams to self report the number of times the prompt is used, which is a more manual way of tracking adoption.
Has the prompt resulted in any changes?
This means tracking updates to Acceptance Criteria and User Stories to see if using the prompt makes changes. This could be a Jira workflow to track using the prompt to report on user story testability followed up by changes being made or seeing if a story’s testability score changes from Bronze > Silver > Gold. Another metric we can cover is whether using the prompt reduces the amount of time in something like a 3 Amigos session. Do those meetings reduce in time, or are more tickets covered in them as a result of using the prompt (probably both manual things to track).
Has using the AI improved our culture of quality?
This would be overall organisational impact, rather than anything at a team level. We can measure the overall testability score for all stories created in the organisation and how this improves over time, or we can look at team confidence scores of being ready to pick up work. There could be a knock on effect to team velocity (better defined User Stories leading to less rework, escaped defects and bugs) but this is hard to measure for unless adding AI tooling is the only thing we’re changing in our processes.
It’s important to track outcomes and impact for any new AI tooling so we can see if it’s worth use in our context. There’s a lot of hype around what’s a very new technology and it can be very exciting to jump on the train and try to use it wherever and whenever we can. Cutting through the hype and measuring actual impact and use is important because many organisations might find that this tooling just doesn’t help them make any impact.
When would it be good to try this?
In my opinion the best contexts to try and use AI to support shift left testing would include:
You are already open to shift left
You, your team and your organisation need to see value in shifting your testing left through something like 3 Amigos. Maybe you’ve tried doing something like that in ticket / story refinement but didn’t know where to get started or your teams have expressed an interest in this way of working. You could also trial it in an organisations that haven’t seen shift left before as a way of asking questions to show the art of the possible and the value shifting left can bring.
You don’t have a dedicated quality specialist
Organisations with a quality coach that doesn’t have the time to attend every ceremony can spread their influence and impact by using an AI prompt like this. We can enable teams to run 3 Amigos sessions and shift left with this tool and start coaching other engineers into asking questions like this (with an aim to ultimately being able to replace the prompt).
You want to accelerate communication
Does your team find that there’s lots of unanswered questions and uncertainty with what to build? Something like 3 Amigos (and a shift left testing AI tool) can help with that. By asking questions and trying to build clarity of what’s needed, you can improve communication in your team to gain shared understanding; something that leads to improved quality and velocity!
You’ve been set an AI quota or target to hit
There’s a lot of hype around AI, the reality is that a lot of us are being pushed to adopt it. A tool like this can be a quick and easy way to start adopting AI and show whether it’s useful in your context or not. This can please your manager, C-suite and board or (if it doesn’t work for you) help to start a conversation with leadership about whether the hype is worth it for your organisation.
Hopefully this prompt can be useful to you and your teams. I’d love to hear from you if you’ve picked it up and used it, so reach out and let’s discuss!
Thanks for taking the time to read! If you found this helpful and would like learn more, be sure to check out my other posts on the blog. You can also connect with me on LinkedIn for additional content, updates and discussions; I’d love to hear your thoughts and continue the conversation.
