Assessing unit test quality with gap analysis

Gap analysis is a practice we use at Diffblue to prioritize the improvements we make to our product, like the quality of the unit tests Diffblue Cover creates. We’ve customized this process to our business, but the basic principles behind it could be beneficial to any organizations that are breaking new ground with their product.

Gap Analysis in a Nutshell

Essentially, gap analysis is a self-assessment of what Diffblue Cover is doing well, where we could improve, and which actions to take next using both human and data-driven judgment. Due to the nature of writing unit tests, our problem and solution domains are effectively infinite, so answering the question “what should we do next?” is not a simple task.

Gap analysis allows us to look for patterns so the product management decisions we make are more likely to lead to universally applicable improvements. It also helps us identify common recurring problems across multiple projects that we should prioritize because we can resolve them with a small amount of work and achieve a broad impact.

How do we do it?

A couple of times each year, we find a selection of projects that represent a wide variety of Java codebases to capture the types of projects that our users might encounter. We mainly do this by scouring open-source repositories for well known Java programs. Next, we run our tool against them to see what types of tests it produces (or doesn’t produce) and why, and figure out what we could efficiently change to make Cover create those tests. We also use this as an opportunity to identify which types of tests could benefit from looking even more like they were written by a human. The whole process takes between half a sprint and one sprint.

Gap analysis is done manually, but there’s a creative element to it. You don’t just judge tests—you look at what methods didn’t get as many tests produced as we would expect, and assess how widely applicable the code pattern is across the projects. If it’s widely applicable, then we think about how you would write a test for those methods, and then how our tool could be improved to do the same. It’s an opportunity to put new creative ideas into our engine. Surprisingly often, you just need a relatively small tweak to get a big result, which is rewarding.

A successful gap analysis is one that produces a lot of ideas and concrete, actionable next steps to improve the product—I wouldn’t be surprised if 30-50% of our new small features and improvements in the past few months came out of our spring gap analysis.

The Benefits

Gap analysis is a way of getting some hard data about how widely effective things are, rather than point fixes. When we do this, we’re looking for improvements that are small in scope and development cost but broad in impact. We created stories last time that were each completable in a sprint or half a sprint that brought half a percentage point of new coverage across multiple projects, and that will affect 80% of our customers. For example, Diffblue Cover wasn’t generating assertions for return values of Optional or Functional types. In our gap analysis, we recognized this pattern and spent half a sprint addressing this, and now we can generate tests for these. (Support for Optional landed in the 2021.05.01 release, and Functional landed partly in 2021.04.02 and 2021.06.01.)

Projects we use for this exercise also occasionally end up as catfooding projects, because projects that are interesting during gap analysis typically have features we might not support yet. This type of project often provides more insights in catfooding than a well-covered project with less growth potential.

How to Start

There are three things to keep in mind when doing gap analysis:

1. Pick the right team

Gap analysis is a dynamic exercise that requires effective communication to avoid duplicating work by chasing the same pattern from different sources, and the key to doing it successfully is starting with the right people. You need engineers who are engaged in the exercise, and it’s better to have senior developers who can evaluate the quality of the output. We typically work with one engineer from each team to distribute the workload. Our gap analysis teams enjoy the creativity of the exercise and get a kick out of figuring out how to produce new tests.

2. Select the right projects

Your analysts need to know what your dataset is and pick a reasonably constrained set of inputs, because that input is your proxy for the world—so if the proxy doesn’t reflect your real world, it could give you misleading directions. Curating the set of projects you use is important and takes time (this goes back to involving the right people).

3. Ensure the process is time-bound

Gap analysis is a fundamentally human process that can’t be automated, which means it can be time-intensive, yet it is often also open-ended. We’ve only scratched the surface of all the ways we could produce tests! This is why it’s best to ensure it’s time-bound: the analysts should try to find the best new ideas for the tool within a predetermined amount of time, ideally producing enough new stories to keep your engineering team busy until the next gap analysis. The right feedback cycle for your organization will depend on how quickly your development is moving.

At Diffblue, we’re moving very fast and doing something nobody has done before by using AI to create Java unit tests. If you’re out in no-man’s land going somewhere nobody has gone before and there’s no map to follow, it’s good to take compass bearings to make sure you stay on track, and that’s what gap analysis does for us.