Can Generative AI Write Good Exploratory Tests?

In my last post I looked at whether Generative AI (such as ChatGPT) can help us to write a good test approach. I found that it’s good at formatting and can write something so long as you feed it the right ideas but it’s limited to what the majority of content online knows about test approached (garbage in, garbage out).

But what about as a tool for writing exploratory tests? Could generative AI take a component or a concept and generate meaningful sessions for exploratory testing, or help us with risk analysis? I’m going to run some experiments to have a look.

Note: My phrasing of queries for ChatGPT is a bit rudimentary and I may not be getting the best out of the tool at this time.

Fig 1. Holly – The AI for the JMC ship Red Dwarf.

Let’s look at some risks

Experiment 1: Ask ChatGPT to identify the risks from an identified component to a system using basic requests.

Vid 1. ChatGPT documenting risks for an API

I used a basic command to ask ChatGPT to “Identify the risks inherent in an API” and it returned 10 risks:

1.  Security vulnerabilities.
2.  Unauthorised access and misuse.
3.  Denial of service (dDoS) attacks.
4.  Lack of data validation.
5.  Inadequate error handling and logging.
6.  Compatibility and versioning issues.
7.  Dependencies / integrations.
8.  Lack of documentation.
9.  Regulatory / compliance challenges.
10. Vendor lock in.

This isn’t a bad start, it’s given us some non functional concerns as well as non-happy path functional areas to think about. We’d probably need to consider this list as a first pass to build upon and then review it for meaning (and to add our own risks).

Things it’s missing: Performance, Scalability (requests and request load), Data issues, Load balancing, Observability, Resilience, Accuracy.

Experiment 2: Ask ChatGPT to identify the risks from an identified component to a system using a more detailed requests.

Vid 2. ChatGPT documenting risks for an API from a more verbose request

In this experiment I provided some more details about the API component and asked ChatGPT to uncover more risks.

“Identify the risks for a newly developed RESTful API.
– The API can create, read, update and delete data between a front end and a database.
– The API must serve data in English, French and Japanese.
– The API will work for users using the front end via desktop PC and mobile devices.
– The API will need to serve traffic across thousands of requests at the same time.
– Provide details of both functional and non functional risks.”

it responded with 10 risks:

1.  Data integrity.
2.  Authorisation.
3.  Error handling and logging.
4.  Performance.
5.  Localisation.
6.  Scalability.
7.  Security.
8.  Availability.
9.  Compatibility.
10. Documentation.

Again, this is an okay list to start with but it needs some additional thinking behind it. The list does give us some risks but there’s not a lot of meaning as to why they’re risky or why we’d need to test for them. It gives us localisation but doesn’t give details about character sets of data transformation (or data standards across different regions).

Interestingly with a more detailed prompt we didn’t get more risks or more detailed risks, but we did get some different ones.

As testers we can use this as a good aide memoire, or the starting point of a risk analysis but would need to provide our own context and further thinking here.

Experiment 3: Ask ChatGPT to create test session ideas based on a previously identified risk.

Vid 3. ChatGPT writing some tests based on a risk

I used a basic command to ask ChatGPT to “Create exploratory testing sessions to test for the risk of an API failing because incorrect characters were included in the JSON payload” and it returned 3 sessions. Here’s an example of one of the sessions:

Session 1: Basic Character Validation
1. Start by creating a valid JSON payload and send it to the API endpoint.
2. Test various special characters like !@#$%^&*(){}[];:"'<>,.?/|~` in different fields of the payload.
3. Observe the API response for any unexpected behaviour or errors. The API should handle these characters gracefully and not produce any failures.
4. Repeat the above steps with different combinations and variations of special characters.
5. Pay attention to edge cases, such as including special characters at the beginning or end of a field or having consecutive special characters.
6. Document any issues or failures encountered during the testing.

This isn’t an exploratory testing session, it’s a script for character validation. Sessions 2 and 3 are the same script but with a different title:

Session 2: Unicode and Non-ASCII Characters
2. Test various languages and scripts such as Chinese, Arabic, Russian, etc., by including their respective characters in the payload.
3. Include characters from different Unicode blocks like mathematical symbols, emoji, currency symbols, etc.

Session 3: Boundary Testing
2. Test the maximum allowed length of the payload by including a large number of characters in each field.
3. Test the minimum allowed length of the payload by including empty or null values in different fields.

The tests suggested here are special characters and other languages / unicode but this isn’t anything as exciting as the list of naughty strings. There’s no mention of 2bit characters, right to left characters, emoji 🦦 or injections.

We’d have to reformat this (and add a lot of our own knowledge) to create some actual exploratory test sessions.

Experiment 4: Ask ChatGPT to create multiple exploratory test ideas based on a previously identified risk.

Vid 4. ChatGPT writing some tests ideas based on a risk and a prompt for numbers

I asked ChatGPT to “Create 25 exploratory test ideas for the risk that a user may cause an error by inserting an invalid character into an API payload.” and the response was:

1. Test with various special characters (e.g., $, #, @, &, *, etc.) to check how the API handles them.
2. Insert invalid characters at the beginning, middle, and end of the payload to evaluate the API's parsing capabilities.
3. Test with a mix of upper-case and lower-case letters to determine if the API is case-sensitive.
4. Include Unicode characters from different languages to verify how the API handles internationalization.
5. Test with long strings containing a combination of valid and invalid characters to assess the API's input length validation.
6. Insert multiple instances of the same invalid character within the payload to check if the API handles them consistently.
7. Test with control characters (e.g., null, newline, carriage return) to evaluate the API's response to such inputs.
8. Include HTML tags or special sequences (e.g., <script>, <img>, <iframe>) to check if the API properly sanitises the input.
9. Insert a mixture of alphanumeric characters and special characters to test the API's handling of complex payloads.
10. Test with different encodings (e.g., UTF-8, UTF-16, Base64) to validate how the API decodes and interprets the payload.
...

Now we’re cooking! These are some great ideas around the risk and give me something to work from; plus they’re in a better format that’ll allow me to spring ideas from.

I tried using this type of query again and got similar responses in the style of tests it created. Here’s some of the responses to the request “Create 25 exploratory test ideas for the risk that an API may fail because of the size of a valid JSON payload sent to it.”:

1. Test with a small JSON payload to ensure the API can handle the minimum expected size.
2. Gradually increase the size of the JSON payload to assess the API's scalability and performance limits.
3. Test with a large JSON payload that is just under the maximum allowed size to check if the API handles it correctly.
4. Test with a JSON payload that exceeds the maximum allowed size to validate if the API properly rejects it with an appropriate error message.
5. Test with a JSON payload containing nested structures (e.g., arrays within arrays) to assess the API's ability to handle complex data.
6. Include a large number of key-value pairs within the JSON payload to evaluate the API's ability to process and parse them efficiently.
7. Test with a JSON payload where string values are extremely long to verify if the API can handle such lengthy data.

Again, there’s some useful ideas for testing here that I can follow through and use to spark ideas on deeper tests as I go through.

Conclusions and Thoughts

It looks like we could use Generative AI as a good starting point for identifying basic tests and risks for something. The tests are somewhat basic, but we could use them to start us off (so long as we have the right risks to give it) much like testing flash cards, heuristics and risk storming has done in the past.

I’d be tempted to give the format of “Create <number> exploratory test ideas for the risk that <risk>” to developers during a bug bash or during some paired testing as a way to get them started. this might be a good way to support additional testing of a product in an environment where there aren’t many testers.

As with any tool, these can be quite shallow or garbage in, garbage out. On a casual glance these look like a good starting point, but we should be weary and ensure we do some additional thinking to tailor risks / test ideas to our specific needs too.

Using specific and tailored prompts can make Generative AI give us a good first pass for exploratory test ideas, but more experiments are needed to see how far it can go.

One response

  1. Luc Avatar

    The test sessions and exploratory test ideas generated by ChatGPT are definitely helpful, providing you with tangible starting points for testing certain risks. It’s impressive how you experimented with different prompts and obtained diverse results. Overall, it’s wonderful to witness the potential of Generative AI as a tool to support testing efforts and encourage collaboration among developers and testers. Keep up the great work!

    Liked by 1 person

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.