The Analyses
The code we used for this experiment comes from Chapter 3 of our tutorial on unit testing: How to build a complete test suite. A perfect score would be a tool that generated a complete test suite on its own. This is a tall order, but by setting the bar high, we can see which one comes closest.
Each of the tools will be used with the default settings, i.e. those that they are shipped with, or those suggested in the ‘getting started’ documentation.
Randoop
We installed, downloaded and extracted Randoop and set it up according to the documentation. Randoop is provided as a Java application, which has a few clear advantages: Running on a Java developers’ environment ensures the JRE/JDK will be available. However, it does mean that some of the dependencies required to run the tool have to be found separately, including the correct version of JUnit.
SquareTest
This tool was the easiest to get up and running. After installing the product, we simply had to follow the on-screen instructions—no need to refer to documentation to use the product.
SquareTest is designed to produce a boilerplate test class rather than complete test cases. However, for the code analyzed in this comparison, the boilerplate code was only appropriate for one of the test cases that we were looking to write.
EvoSuite
As the slowest of these tools to get started with, EvoSuite required:
- Installing the plugin
- Specifying Maven and Java home
- Adding EvoSuite to the pom.xml for test generation
The second and third steps were challenging, because they required referring to the docs and also hunting around the system to find out where Maven and Java were installed. Typically, this is handled by IntelliJ/environment variables.
Diffblue Cover
Installation is covered in the Diffblue Cover Documentation. Since everything required to use Cover was contained in one package, it was easier to set up and use than Randoop or EvoSuite and didn’t actually require consulting the documentation.
Diffblue Cover produced the most complete test suite. Though line coverage was slightly lower than EvoSuite’s (94%, compared to 96%), the tests generated by Cover were more meaningful.
Diffblue Cover was also the only tool to have an “Assert Suggestion” feature, which provided auto-completion options for asserts that might be appropriate when writing tests.
Of the four tools on this list, all provide assistance and value. The tests they create are generated much more quickly than those written by people, can cover most cases, and sometimes even write tests for edge and corner cases that people wouldn’t have identified. And each capability represents a stepping stone to even more advanced code creation technology.
AI for unit tests
While the results of this experiment are varied, one tool does stand out. The AI behind Diffblue Cover has differentiated this tool from the others by generating meaningfully high coverage, i.e. high-quality tests, rather than simply a high quantity of them.