Copilot vs. Diffblue Cover: The AI unit test showdown

The rise of AI-driven software development tools has dramatically altered the way and how easily developers write code. GitHub Copilot (Copilot), is a popular AI pair programmer from Microsoft and OpenAI, has gained significant attention for its ability to suggest code snippets and functions. On the Copilot website, its promotional demo highlights its ability to suggest unit tests for a selected function. While Copilot undoubtedly provides code suggestion value, one crucial area where developers increasingly seek more robust support from AI solutions is automating test generation and execution at scale.

Copilot vs Diffblue Cover for unit test generation

In this blog, I’ll share the results of an in depth comparison of GitHub Copilot and Diffblue Cover (Cover), a specialized AI solution designed explicitly for autonomous unit test generation. I’ll evaluate their strengths, limitations, and how they fit into modern software development workflows—particularly in the context of unit testing Java. My analysis will center around the popular Spring Boot Java Petclinic application as a practical testing ground.

GitHub Copilot Overview

The AI Pair Programmer: GitHub Copilot acts as a highly sophisticated autocomplete engine. It analyzes your code, the surrounding context, and draws from its vast training data to provide real-time suggestions. These suggestions can range from simple variable names to entire code blocks. In essence, Copilot aims to be a helpful assistant, prompting you with ideas and reducing repetitive coding. Under the hood, GitHub Copilot uses a Large Language Model (LLM). As GitHub is part of the Microsoft Corporation, it’s highly influenced by another company investment of Microsoft: Open AI.

Diffblue Cover Overview

Your Autonomous Testing Partner Diffblue Cover approaches AI coding assistance from a different angle. It is a specialized unit testing solution and its primary goal is to autonomously create complete and ready-to-use unit tests for your codebase intelligently. This means Cover analyzes your code structure, identifies potential test cases, and creates test code that aims to achieve meaningful code coverage. Behind the scenes, Diffblue Cover is trained exclusively on the user’s actual codebase using reinforcement learning.

What we tested (comparison criteria)

The points of comparison used focus on the following core aspects of unit testing Java applications:

Ease of Use and Integration: How seamlessly does each tool integrate into your existing development environment?
Test Generation Efficiency: How quickly can each tool produce tests and is this process quicker than writing the tests manually?
Test Quality and Coverage: Do the generated tests effectively exercise your code and provide meaningful coverage?
Catching Regressions: How effectively can the generated tests detect breaking changes introduced by code modifications?
Security and Intellectual Property: What are the implications of using each tool in terms of code privacy and potential IP risks?

Test code base: Spring Boot Java Petclinic

The Spring Boot Petclinic application provides a familiar and representative Java project for our evaluation. It contains controllers, services, and repositories – a typical structure found in countless real-world applications.

I’ll focus on using both tools to generate tests for a specific controller within the project.

HTTP GET Controller Example

Let’s start with the first test generation comparison between GitHub Copilot and Diffblue Cover. Here, we’re generating tests for the /pets/new endpoint of the PetController class:

@Controller
@RequestMapping("/owners/{ownerId}")
class PetController {
     
	// … more code

    @GetMapping("/pets/new")
    public String initCreationForm(Owner owner, ModelMap model) {
       Pet pet = new Pet();
       owner.addPet(pet);
       model.put("pet", pet);
       return VIEWS_PETS_CREATE_OR_UPDATE_FORM;
    }
}

This HTTP GET endpoint is rendering a Thymeleaf view that includes a form to create or update a pet for the Petclinic.

There’s no computational complexity in these four lines of code as it’s creating a Java object, enriching the MVC model and returning the MVC view name.

Let’s use this initial endpoint and use both tools to generate tests.

GitHub Copilot Test Results

With the help of the GitHub Copilot IntelliJ IDEA plugin, I started a new chat with the LLM behind Copilot and asked the AI to generate tests.

I started the conversation with the following prompt:

> Please help me write tests for the initCreationForm method of the PetController class

As a result, GitHub Copilot returns:

GitHub Copilot warns that it is "powered by AI, so surprises and mistakes are possible. Make sure to verify any generated code or suggestions."

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.mockito.Mock;
import org.mockito.MockitoAnnotations;
import org.springframework.ui.ModelMap;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.mockito.Mockito.*;

class PetControllerTest {

    @Mock
    private OwnerRepository ownerRepository;

    private PetController petController;

    @BeforeEach
    void setUp() {
        MockitoAnnotations.openMocks(this);
        petController = new PetController(ownerRepository);
    }

    @Test
    void testInitCreationForm() {
        // Arrange
        Owner owner = new Owner();
        owner.setId(1);
        when(ownerRepository.findById(anyInt())).thenReturn(owner);

        ModelMap model = new ModelMap();

        // Act
        String view = petController.initCreationForm(owner, model);

        // Assert
        assertEquals("pets/createOrUpdatePetForm", view);
        assertTrue(model.containsAttribute("pet"));
    }
}

On top of this copy-pastable test code, GitHub Copilot includes a response with further information and the code references it used to generate the test class:

GitHub Copilot includes a response with further information and the code references it used to generate the test class.

When taking over the suggested code, I immediately get a Java compilation error. Hence the generated test doesn’t run successfully without modification.

In our example, the LLM forgot to include one import statement for the assertTrue(); assertion:

the Copilot LLM forgot to include one import statement for the assertTrue(); assertion

Albeit this is a quick fix, the LLM did not properly generate a compilable Java class.

After I fixed this error by adding the correct import statement, I got the following test setup:

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.mockito.Mock;
import org.mockito.MockitoAnnotations;
import org.springframework.ui.ModelMap;


import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;
import static org.mockito.Mockito.*;


class PetControllerTest {


   @Mock
   private OwnerRepository ownerRepository;

   private PetController petController;


   @BeforeEach
   void setUp() {
      MockitoAnnotations.openMocks(this);
      petController = new PetController(ownerRepository);
   }


   @Test
   void testInitCreationForm() {
      // Arrange
      Owner owner = new Owner();
      owner.setId(1);
      when(ownerRepository.findById(anyInt())).thenReturn(owner);

      ModelMap model = new ModelMap();

      // Act
      String view = petController.initCreationForm(owner, model);


      // Assert
      assertEquals("pets/createOrUpdatePetForm", view);
      assertTrue(model.containsAttribute("pet"));
   }
}

GitHub Copilot generated a bare-bones Java unit test, relying only on JUnit and Mockito for the test setup.

Furthermore, it only generated a single test. While we might get more tests if we adjust the initial prompt or reply to the response, it takes additional manual intervention to get a full test suite.

On top of this, the given unit test invokes the tested method directly. While this works for more algorithmic-like implementations, when testing an HTTP controller endpoint, we have better options that include an in-memory HTTP environment to verify more than just the returned value of the method. This includes data serialization, HTTP status code and input validation.

A better approach is to use MockMvc in this scenario.

Nevertheless, the generated test did add meaningful value as it validated the returned controller view name and ensured the model was properly populated.

On a second attempt, and when replying with the following response prompt:

> Please use @WebMvcTest for it

We get a new test class with a single test using @WebMvcTest:

package org.springframework.samples.petclinic.owner;

import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.mockito.Mock;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.autoconfigure.web.servlet.WebMvcTest;
import org.springframework.boot.test.mock.mockito.MockBean;
import org.springframework.test.web.servlet.MockMvc;
import org.springframework.ui.ModelMap;

import static org.mockito.Mockito.*;
import static org.springframework.test.web.servlet.request.MockMvcRequestBuilders.get;
import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.*;

@WebMvcTest(PetController.class)
class PetControllerTest {

    @Autowired
    private MockMvc mockMvc;

    @MockBean
    private OwnerRepository ownerRepository;

    @Test
    void testInitCreationForm() throws Exception {
        // Arrange
        Owner owner = new Owner();
        owner.setId(1);
        when(ownerRepository.findById(anyInt())).thenReturn(owner);

        // Act & Assert
        mockMvc.perform(get("/owners/1/pets/new"))
                .andExpect(status().isOk())
                .andExpect(view().name("pets/createOrUpdatePetForm"))
                .andExpect(model().attributeExists("pet"));
    }
}

This test is now a better alternative to the first test and it compiles and runs without any manual interference.

Diffblue Cover Test Results

Next up, I used the Diffblue Cover IntelliJ IDEA plugin to try to generate a set of unit tests for this controller endpoint with a single click:

Diffblue Cover operation in IntelliJ - single click write tests for an the controller endpoint with a single click.

After automatically checking the technical requirements to generate tests Diffblue Cover took less than a minute to write six unit tests:

@ContextConfiguration(classes = {PetController.class})
@ExtendWith(SpringExtension.class)
@DisabledInAotMode
class PetControllerDiffblueTest {
   @MockBean
   private OwnerRepository ownerRepository;


   @Autowired
   private PetController petController;


   @Test
   void testInitCreationForm() throws Exception {
      // Arrange
      Owner owner = new Owner();
      owner.setAddress("42 Main St");
      owner.setCity("Oxford");
      owner.setFirstName("Jane");
      owner.setId(1);
      owner.setLastName("Doe");
      owner.setTelephone("6625550144");
      when(ownerRepository.findPetTypes()).thenReturn(new ArrayList<>());
      when(ownerRepository.findById(Mockito.<Integer>any())).thenReturn(owner);
      MockHttpServletRequestBuilder requestBuilder = MockMvcRequestBuilders.get("/owners/{ownerId}/pets/new", 1);


      // Act and Assert
      MockMvcBuilders.standaloneSetup(petController)
         .build()
         .perform(requestBuilder)
         .andExpect(MockMvcResultMatchers.status().isOk())
         .andExpect(MockMvcResultMatchers.model().size(3))
         .andExpect(MockMvcResultMatchers.model().attributeExists("owner", "pet", "types"))
         .andExpect(MockMvcResultMatchers.view().name("pets/createOrUpdatePetForm"))
         .andExpect(MockMvcResultMatchers.forwardedUrl("pets/createOrUpdatePetForm"));
   }


   // five more tests
}

Diffblue Cover successfully creates the required test setup to run the tests with JUnit Jupiter and created a minimal Spring TestContext application setup to use MockMvc to perform HTTP-like requests to the in-memory mocked servlet environment.

On top of this, Diffblue Cover grouped all test code within each test method in the Arrange, Act and Assert test pattern. This makes reading and understanding the test(s) much easier.

The test name derives from the test’s Java method and if there is more than one test, Diffblue Cover uses an incrementing counter to differentiate the tests.

All generated tests executed successfully on the first try and I can immediately check them into our source code repository (if needed).

The various tests for this HTTP GET endpoint vary in the tested scenario. We get a test for the happy path as well as a test for various corner cases, like missing query parameters, path variables or invalid input data.

As an example, the following test verifies that the endpoint returns the HTTP status code 400 for an invalid path variable:

@Test
void testInitCreationForm2() throws Exception {
   // Arrange
   Owner owner = new Owner();
   owner.setAddress("42 Main St");
   owner.setCity("Oxford");
   owner.setFirstName("Jane");
   owner.setId(1);
   owner.setLastName("Doe");
   owner.setTelephone("6625550144");
   when(ownerRepository.findPetTypes()).thenReturn(new ArrayList<>());
   when(ownerRepository.findById(Mockito.<Integer>any())).thenReturn(owner);
   MockHttpServletRequestBuilder requestBuilder = MockMvcRequestBuilders.get("/owners/{ownerId}/pets/new",
      "Uri Variables", "Uri Variables");


   // Act
   ResultActions actualPerformResult = MockMvcBuilders.standaloneSetup(petController).build().perform(requestBuilder);


   // Assert
   actualPerformResult.andExpect(MockMvcResultMatchers.status().is(400));
}

Copilot and Diffblue Cover: detailed comparison

Test Completeness and Compilation

Copilot: A significant point to note is that Copilot does not always generate fully compilable or runnable test code. It makes suggestions that often require additional developer review and input to address syntax errors, missing imports, or incorrect mocking setups.
Diffblue Cover: One of Diffblue Cover’s primary strengths is its ability to autonomously generate syntactically correct and compilable tests. This reduces the initial overhead involved in getting the tests into a working state.

Test Case Coverage and Quality

Copilot: The quality of Copilot’s test suggestions varies. In some cases, it might propose relevant test cases. However, it sometimes also offers overly simplistic tests or misses crucial edge cases that would require significant manual intervention to ensure thorough unit testing.
Diffblue Cover: Diffblue Cover uses several techniques to ensure meaningful test coverage. It analyzes code paths, identifies potential edge cases, and strives to create tests that exercise various conditions within your code.

Mocking Frameworks and Data Generation

Copilot: Copilot’s suggestions include some basic mocking logic, but it often falls short when complex interactions or dependencies are present. A developer will likely need to manually refine or introduce mocking frameworks.
Diffblue Cover: Diffblue Cover integrates tightly with popular mocking frameworks like Mockito. It intelligently generates mocks and stubs for your dependencies, providing a more comprehensive starting point for your tests. Furthermore, Cover often includes meaningful data in its tests, making them more realistic.

Regression Detection

Copilot: Due to the sometimes rudimentary nature of Copilot-assisted tests, their effectiveness in catching regressions is very limited. If a code change breaks underlying functionality that the test doesn’t fully cover, it might still pass without raising any alerts.
Diffblue Cover: Since Diffblue Cover aims for deeper code path coverage, the generated tests catch regressions introduced by code changes. When a breaking change occurs, you would expect related Diffblue Cover tests to fail, signaling the need for test and/or code adjustment.

Speed: Delivering Results Faster

Diffblue Cover’s ability to autonomously generate comprehensive test suites can lead to substantial time savings. While Copilot requires an iterative, back-and-forth interaction, Cover produces the bulk of the test code upfront. In a typical scenario, a developer might spend 15 minutes writing a unit test. Comparative data demonstrates that Cover can drastically reduce this time investment (e.g., delivering a test in mere seconds). This translates to developers being able to achieve better coverage in substantially less time.

Scale: CI Integration and Growth

Copilot is primarily geared towards in-IDE interactions. While it can provide value in this context, scaling its usage across an entire codebase, especially within automated CI/CD pipelines, presents challenges. Diffblue Cover, on the other hand, is designed for seamless integration into CI pipelines. Every code change can trigger Cover to regenerate and update your unit test suite, ensuring your coverage keeps pace with development and potential regressions are flagged promptly.

Security: Protecting Your Intellectual Property

GitHub Copilot’s training on vast amounts of public code raises concerns about potential copyright infringement, IP leakage and security risks. Sensitive or proprietary code sent to Copilot’s servers could inadvertently expose vulnerabilities or find its way into future code suggestions. Diffblue Cover addresses this concern by running entirely on-premise – your code stays entirely within your own infrastructure, mitigating IP and security risks. Additionally, Diffblue Cover offers indemnification against IP infringement claims.

Strengths and limitations

GitHub Copilot

Strengths:

Assists developers in writing tests by providing code suggestions and completions.
Seamless integration with most popular IDEs.
Intuitive and easy to use.
Test code suggestions can help developers accelerate the test writing process.

Limitations:

Can generate incomplete or inaccurate tests, requiring manual verification and refinement.
Less effective for comprehensive unit test generation, particularly for complex scenarios.
The tests miss edge cases or exhibit inconsistent behavior which results in future code regressions

Diffblue Cover

Strengths:

Autonomous AI-based test generation, reducing manual coding time.
Generates comprehensive and robust test cases covering a diverse range of scenarios and edge cases.
Delivers highly accurate and reliable tests, enhancing code quality.
Facilitates regression prevention with thorough testing and detailed test reports.
Easy integration into existing development workflows and CI/CD pipelines.

Limitations:

Initial learning curve associated with understanding and optimizing test generation parameters.
Generates a larger and comprehensive number of test cases. As a result there are more to review, should a developer need to.
May exhibit certain limitations when it comes to complex integration and data flow scenarios.

Final thoughts

Both GitHub Copilot and Diffblue Cover bring the power of AI to software development to help developers. GitHub Copilot excels as an assistive side-kick, offering real-time suggestions to augment your coding process. However, when the primary goal is to quickly create comprehensive unit tests for complex applications (with numerous possible test scenarios) and to streamline the testing process for both individual developers and entire development teams, Diffblue Cover wins hands down.

Diffblue Cover empowers developers to bolster their testing practices, by testing more thoroughly, much faster but and actually reduces the amount manual testing toil needed, all whilst safeguarding code security and IP. Its autonomous test generation capability acts as a force multiplier, enabling teams to focus on strategic development while maintaining a robust safety net of unit tests.

I would definitely encourage you to try both Diffblue Cover and GitHub Copilot to experience the difference for yourselves, firsthand. Explore the free trials and see how these tools can streamline your Java testing processes.

Author: Philip Reicks (Independent software engineer and consultant)

Copilot vs. Diffblue Cover: The AI unit test showdown

Author

Table of contents

Copilot vs Diffblue Cover for unit test generation

GitHub Copilot Overview

Diffblue Cover Overview

What we tested (comparison criteria)

Test code base: Spring Boot Java Petclinic

HTTP GET Controller Example

GitHub Copilot Test Results

Diffblue Cover Test Results

Copilot and Diffblue Cover: detailed comparison

Test Completeness and Compilation

Test Case Coverage and Quality

Mocking Frameworks and Data Generation

Regression Detection

Speed: Delivering Results Faster

Scale: CI Integration and Growth

Security: Protecting Your Intellectual Property

Strengths and limitations

GitHub Copilot

Diffblue Cover

Final thoughts

Related articles

Ready to stop manually unit testing?