Code generation with Symflower & GitHub Copilot: AI vs Symbolic Execution for Software Testing

AI vs Symbolic Execution for software testing: using Symflower and GitHub Copilot combined

This post takes a look at two incredibly useful tools that transform the way software developers work. Read on to find out about the background and use of GitHub Copilot and Symflower for software testing!

The AI revolution, and especially generative AI, has reached software development. Large Language Models have evolved to a state where they can provide valuable services for developing applications.

Tools like GitHub Copilot can predict and complete lines of code based on the code you’ve already written, or on a prompt. They can even understand and explain the functionality you’re trying to deliver. Some developers are already using such tools to accelerate and simplify their daily workflows.

But generative AI isn’t the only technology that can help improve developer productivity, efficiency, and flow. Symflower, a tool based on Symbolic Execution, uses very different technology than Artificial Intelligence based on Large Language Models to generate unit and integration tests. But there are some similarities in what it can do when compared to Copilot, and as we will see, these tools can complement each other in a way that helps you make the best of both worlds.

Find out how Symflower supports test generation:

Read blog post: Smart Java Unit Test Generation for VS Code and IntelliJ

This article provides a comparison and outlines a potential use case to apply Symbolic Execution alongside generative AI to help deliver better code faster.

Symflower vs GitHub Copilot for generating (testing) code

GitHub Copilot: AI-based code completion

What does GitHub Copilot do?

In essence, Copilot is an AI pair programmer. Developed by GitHub and OpenAI, it provides code completion suggestions based on your existing code or your manually added prompts. Copilot can even suggest entire functions in real time, right in your editor, basically removing repetitive, low-value tasks from your workflow. Its recommendations, provided right within your IDE, are based on your project’s context and style conventions. This helps make sure they are immediately useful and free up your time to focus on business logic rather than the mundane parts of coding.

Limitations of GitHub Copilot

The code snippets that tools such as Copilot create can be quite impressive and can provide a great starting point to iterate on. But there’s still an underlying Large Language Model (LLM) at work processing all the input information. As such, there is no guarantee that the generated code will be correct and will work since your program context is analyzed from a language perspective rather than a computing perspective. The AI will try to work out what kind of code is often used in conjunction with the code you’ve selected (similar to how your phone predicts the next words based on the beginning of an existing sentence). Artificial Neural Networks used to build LLMs by design use a random sampling process when computing results. Hence their results are not deterministic: the outcome may not be the same if you make the same request twice.

In terms of testing, Copilot Chat (part of the Copilot X suite) offers a feature to automatically write unit tests for the selected block of code. The tool will try to interpret code logic and will generate tests based on it. That said, if the selected snippet calls other methods in a different file, Copilot will have to rely on pure guessing. It’s also important to bear in mind that Copilot cannot access code in other files.

Privacy

As an AI tool, Copilot’s algorithm is trained using tons of data. Usually, that’s public data found on the internet, and there’s no way to know just what kind of data was used for training. It also means that your own code snippets may or may not be transmitted to GitHub’s servers to improve further suggestions (user reports contradict on this topic, and for the base version of GitHub Copilot, there’s no public privacy statement to shed light on the question). If you’re a Copilot for Business suite user, GitHub says that your data will be discarded after a suggestion is made, and your code snippets will not be used to further train the algorithm.

Language and Tooling Support

In terms of the technical background, GitHub Copilot supports Python, JavaScript, TypeScript, Ruby, Go, C# and C++. Since suggestions are based on the used training data, programming languages that are insufficiently represented in public repositories may produce fewer or less robust suggestions by Copilot. The tool is available for Neovim, JetBrains IDEs, Visual Studio, and Visual Studio Code.

Symflower: Java test generation based on mathematical models

What does Symflower do?

In contrast to Copilot, Symflower specifically focuses on generating test code. It can automatically write unit test templates to be used as boilerplate for further tests, and complete unit test suites with meaningful values. Once a test suite is generated, Symflower also provides inline, test-backed code diagnostics. For instance, if an exception can be triggered, it will be highlighted in your editor, and you can easily run a unit test to reproduce the problem.

Symflower is not an AI-based tool. Rather, it leverages mathematical models with a technique called Symbolic Execution (SE) to generate high-coverage test suites. In essence, the tool uses code analysis to generate boilerplate code, and a complex mathematical model to determine meaningful test values. With Symbolic Execution, Symflower can analyze your code to identify the inputs that lead to the execution of each and every path in your application.

How Symbolic Execution computes input values

In short, Symbolic Execution means that your code is executed with symbolic instead of actual values. All conditions related to these symbolic values are expressed in mathematical constructs, and then the values for fulfilling these conditions are determined. That’s how SE is able to compute values that trigger all possible execution paths. (See a comparison of the various available methods for test value generation in one of our previous posts.)

Limitations of Symflower

The main difference is that GitHub Copilot can provide suggestions for production code, whereas Symflower focuses exclusively on testing. The application will always provide compiling test templates with all the necessary imports, annotations, object initializations, function calls, asserts, and more.

Test generation with Symbolic Execution, however, is currently a beta feature that can’t be expected to run on all code bases 100% of the time. In cases where test suite generation can be run, in addition to creating a full test suite that covers all possible paths in your code, Symflower will also provide test-backed code diagnostics in real time.

Tackling privacy concerns with local Symbolic Execution

Another difference in comparison to AI is that while Symflower also works integrated into your IDE, it runs 100% locally. This means that with Symflower, your code never leaves your computer.

Language and Tooling Support

From a technical point of view, Symflower currently supports Java, Spring, and Spring Boot, and is available for Visual Studio Code, IntelliJ IDEA & Android Studio, and CLI.

The benefits of Symbolic Execution over LLM for generating tests

The benefit of using code analysis to determine possible execution paths is that Symflower “understands” all related conditions and solves them mathematically. What that means is that when the logic can be applied, Symflower’s suggestions are always correct, and deliver a slim test suite with only essential tests, no redundant cases. Symflower also supports mocking, with mocks for used interfaces generated automatically.

When compared with AI, SE’s mathematical foundations mean that there is no guesswork involved in generating tests: it delivers consistent tests that you can trust. The results provided by Symflower are always deterministic, e.g. no matter how many times you run test generation on the same code, you’ll always get the same results. But applying the logic is quite challenging and there are still some limitations to Symflower’s use. As a fallback option, the tool can always provide test templates where you just have to fill in the right values for your test scenarios.

Another big difference between the two tools is that as Copilot was not trained on your repository, it can sometimes generate non-compiling test cases. That will not happen with Symflower, as it fully relies on the information it finds within your repository.

Finally, an added benefit of Symflower compared to AI is that once a test suite has been generated, the tool can provide test-backed code diagnostics, highlighting potentially unhandled exceptions inline as you code.

Effective code generation: Symflower vs Copilot or Symflower+Copilot?

Rather than alternatives, you should think of Symflower and GitHub Copilot as complementary tools. Let us elaborate!

Generative AI is by definition an “open-minded” solution that’s great for solving creative problems. Testing, on the other hand, is a more linear and “precise” topic that leaves less room for creativity: you’re basically looking to make sure that all that needs to be tested gets thoroughly tested. Using Copilot with Symflower, you can combine the creativity of AI with the “precision” of logic. In this way, Symflower helps you validate the AI’s suggestions and catch unhandled exceptions in real time.

Copilot is great for providing suggestions when coding to accelerate development. Use it to quickly generate production code – then use Symflower to generate the necessary tests to reveal corner cases in the generated code and ensure the AI didn’t make a mistake. Symflower is a specialized tool that does a great job in generating test suites without redundant test cases and with full coverage of all execution paths, including edge cases.

In cases where you do need to write unit test code manually, Symflower will generate test templates with all the necessary imports, annotations, object initializations, function calls, asserts, and more to save you time and effort.

	GitHub Copilot	Symflower
Underlying technology	LLM (AI based on a Large Language Model)	Symbolic Execution
Scope	Code completion suggestions (production & testing code)	Generating unit tests and templates
Test value generation	Language-based operation: values are not computed but “guessed”	Test values are mathematically computed: always correct
Test coverage	No coverage guarantee	Provides test templates. When test suite generation can be run, Symflower provides 100% full test coverage.
Limitations	No guarantee that the generated code will be correct and will work	Only generates tests, not business logic. Generating full test suites may not be available in all cases
Privacy	Your code is transmitted to GitHub servers to generate your custom suggestions	Your code stays local
Deterministic	No	Yes
Use case	Quickly generating production code as a starting point, focused especially on accelerating mundane tasks	Generating test templates and test suites to check AI-generated and human-written code

See how Symflower works in action in the following video:

Ready to try it for yourself? Install Symflower in your chosen IDE and see how it can help you slash the cognitive burden of testing while ensuring the quality of your applications!

| 2023-10-10

Code generation with Symflower & GitHub Copilot: AI vs Symbolic Execution for Software Testing

Symflower vs GitHub Copilot for generating (testing) code

GitHub Copilot: AI-based code completion

What does GitHub Copilot do?

Limitations of GitHub Copilot

Privacy

Language and Tooling Support

Symflower: Java test generation based on mathematical models

What does Symflower do?

Limitations of Symflower

Tackling privacy concerns with local Symbolic Execution

Language and Tooling Support

The benefits of Symbolic Execution over LLM for generating tests

Effective code generation: Symflower vs Copilot or Symflower+Copilot?

Product

Company