Key considerations when choosing generative AI for code tools

Suddenly every digital product seems to be ‘AI-powered’. In many cases that just means a connection to an OpenAI API has been added without much functional benefit. But recent advances have enabled a range of genuinely new generative AI for code tools that are helping developers create software more efficiently.

It can be hard to keep up with the pace at which these tools are being introduced and upgraded, and to understand exactly how they differ. As a result, choosing the right ones for your team can be a challenge.

Here are some considerations teams should bear in mind when selecting generative AI for code tools to support software development.

How autonomous is it?

Automation tools remove human error and help developers get more done by avoiding the most tedious, repetitive tasks. So it’s logical when assessing one to ask just how much human input is still needed when you use it.

Take a code completion tool like Github Copilot. If it’s going to be more than advanced auto-complete (which may still be helpful) Copilot must be ‘prompted’ by the user so that it can suggest code in response. Understanding what to put in to get the result you’re looking for can be a tricky task and may take multiple iterations.

“The few times I’ve pasted in whole code blocks and asked it to do XYZ, I’ve ended up basically debugging my prompt instead of my code.”

dablue caboose, HackerNews

And although Copilot might mean a developer has to write less code from scratch, they still have to check everything it suggests. Which means spending time interpreting code they didn’t write – definitely not always an easy task, especially in existing codebases – and then usually having to adjust it, either manually or via more prompts.

AI for code tools clearly have the potential to make developers more productive, and many are excited about what they can offer. But engineering leaders should be aware of new overhead they may introduce and look for maximum automation for maximum value. Solutions like Diffblue Cover, for example, work completely autonomously, writing ready-to-run, human-readable Java unit tests without developer input.

How does it scale?

If automation tools require a level of human input, there’s a natural limit to what they can do. Code completion tools like Copilot (or ‘AI pair programmers’, as Microsoft prefers to call them) may help your team increase their output, but the benefit they provide is naturally constrained by the number of sufficiently capable developers available to use them.

To be fair, tools like Copilot are far from unique in this regard; many automation tools still require a level of human intervention. But those with greater autonomy – requiring less human input – scale more easily across organizations. Engineering leaders should consider whether and how tools can be integrated into end-to-end processes for greater value.

Diffblue Cover is not only fully autonomous when used within an IDE; its CLI option also means it can be run across entire codebases consisting of millions of lines of code, without worrying about how many developers are available. Better yet, the CLI means Cover can be fully integrated into a CI pipeline so unit test suites are written and maintained completely automatically every time a code change is made.

Will it expose us to risks?

Many AI tools are based on a machine learning approach called a large language model (LLM). These are trained on vast swathes of data to learn patterns and entity relationships, and then use this information to perform tasks like translating language, creating chatbot conversations, and suggesting code. Great!

Or is it?

Some of these LLMs, like OpenAI’s GPT-4, have come under fire for being closed and opaque. They lack transparency regarding training data, the algorithms used to complete analysis and make decisions, and how outputs are created. Many also utilize the inputs provided by users – potentially highly commercially sensitive – to further ‘train’ their models.

The potential business risks of using third-party LLMs are still emerging. Compliance and IP security are certainly key considerations if you don’t know how the information you input will be used. The training data is also important. In code generated by LLMs for example, how confident can you be that security risks present in the training data aren’t being propagated in your ‘new code’, that there are no copyright issues, and that what it suggests is even real?

Then there’s also the matter of where the models run. Many LLMs are hosted in the cloud on servers run by the tool (or back-end model) provider. That means sending your inputs outside of your organization. This situation is evolving fast – Microsoft may offer a private LLM option soon, likely at a high cost, and some tools like TabNine include an on-premises option enabled by a smaller model – but it’s an important factor.

The net? When it comes to LLM-based AI tools in particular, it’s not a plug and play scenario. Engineering leaders need to consider these risk factors and more alongside the relevant corporate policies before adopting AI for code tools.

With some AI technology the picture is more straightforward, however. Diffblue Cover uses reinforcement learning instead of relying on LLMs, so no training data is needed. It uses your existing code to write new code, and none of your data ever leaves your development environment.

How quickly can we get started?

It’s relatively easy to install AI for code tools like Copilot and start generating code suggestions. But how quickly can you really get to valuable results?

There’s certainly a learning curve which should not be underestimated. To get the most out of AI code completion tools you really need to understand how the technology works. That understanding provides a framework for what effective inputs might be, and how to evaluate the resulting outputs effectively. A lot of trial and error will probably be involved in the initial stages.

The effort required will vary. For low-level tasks it might not be significant, but for anything other than simple or boilerplate tasks (like API calls) the time needed to get up to speed with new tools should not be neglected – especially for more junior developers.

Fully autonomous tools like Cover don’t have this problem, of course, because working code can be written at the click of a button.

There’s no one answer

Development leaders must build a portfolio of multiple tools and capabilities to support AI-augmented software development and testing. There aren’t any ‘silver bullet’ solutions that solve all problems, so it’s critical to look through the hype surrounding the latest AI tools like Copilot and ChatGPT to understand the real pros and cons.

Diffblue Cover provides a fully autonomous solution to a key Java development problem: delivering good unit test coverage without distracting valuable developer resources. For more detail on how Cover differs from LLM-based tools you can watch this webinar, try it now for free, or contact us for a demo.