Programming principle "DRY": Don't Repeat Yourself

How repetition of information can lead to confusion.

This article offers a fresh perspective on the programming principle Don’t Repeat Yourself (DRY) and showcases how we embrace DRY internally at Symflower. DRY is most commonly applied in the following three scenarios, even though we often might not notice that we are actually fulfilling this principle:

Functions: Refactor statement sequences that are repeated over and over in reusable functions.
Inheritance: Eliminate repetition among the implementation of similar classes by properly using inheritance or embedding.
3NF: Eliminate duplication in relational databases by relying on the third normal form.

The Don’t Repeat Yourself coding principle was first introduced by David Thomas and Andrew Hunt in the book “The Pragmatic Programmer”, where the following quote is from.

What is the DRY (Don’t Repeat Yourself) principle in programming?

"Every piece of knowledge must have a single, unambiguous, authoritative representation within a system".

Interestingly this principle can be applied to both logic and process. When applied to logic, abstraction is introduced to reduce repetition (e.g. of sub-formulas or lemmas). This is where the three aforementioned scenarios fit in. To reduce the repetition in processes however, automation is commonly applied. So, to keep complex systems as simple and maintainable as possible, one tries to get rid of repetitions, either through abstraction or through automation.

Ideally, there is no knowledge duplication across our whole system, including DB, backend code, frontend code and even documentation. But, only defining things once can require lots of automation and also code generation. Of course, it always depends on the stack of a project on how to embrace DRY exactly. Also, it might not make sense to rely on a complete DRY environment from the start, but to clean up things as soon as we note that we are doing the same things over and over.

Let’s take a look at the steps we took at Symflower towards a DRY environment during our journey of building our product - a tool that automatically writes Go and Java unit tests for you.

DRYing up the development, testing and production environments

Infrastructure as code enables almost identical environments for development, testing and production.

The production and testing deployments are typically very similar to each other. The same holds for the development environment that is used by the programmers on a project: they must be close to testing and production to ease development and debugging.

Ideally we do not manually execute the steps for updating and synchronizing the development environment among all developers. First, it’s a source of errors, second it is a waste of time.

At Symflower we use infrastructure as code to have almost identical development, testing and production environments. They mainly vary in the allocated resources. We achieved that mainly by making use of: Vagrant for building and maintaining virtual software development environments. Docker/Kubernetes for containers and container management to automate CI/CD. Various shell scripts to automate the installation and update procedures.

DRYing up development process repetitions

Options for automation in the software development process.

In software development there are the typical steps of code review, testing and deploying. All three of them include repetition that can, to some extent, be automated. Let’s take a look at the most interesting automation steps we employ at Symflower.

Automating code reviews (Linters)

During a code review the reviewer typically takes a look at coding conventions, architecture and implementation details. In particular though, adherence to coding conventions can often be checked autonomously by a simple tool and therefore needs no human reviewer.

At Symflower we make extensive use of existing linters like staticcheck, errcheck, gofmt for our Go code and jsfmt or ng-lint for our frontend code. These linters are integrated into our CI/CD pipeline, so code that violates any of these linters cannot be merged.

In addition we also wrote our own linter that checks Symflower specific coding guidelines like the specific structure of comments, code tags (like TODO or FIXME) or the use of empty lines. This might sound like an overkill on first sight. But, programming languages like Go provide packages for easily analyzing Go source files, hence it is straightforward to implement project specific linting rules. To do so we only need to take a look at the Go AST package.

Automated testing

Automating the testing phase of a project is another option to get rid of process repetition. The usual suspects are unit tests, integration tests and system tests, which are present in most state of the art projects. Notable in the Symflower project is our automated migration test. This simple procedure was already able to catch many bugs before they entered production. It is basically built up as follows:

Start up a test environment with the current “master” branch.
Run the system test to have some initial data in the environment.
Run the provided migration steps.
Run the system test again with the code in the current feature branch.
Voila, you have a basic check in place that no migration steps were missing in your feature branch.

Automated deployments

Deploying is another step that is done repeatedly. Hence, it should be automated. When merging to “master” in the Symflower project, not only our production environment is autonomously updated but also the available extensions to editors like Visual Studio Code (VS Code) and JetBrains' IntelliJ IDEA. They are automatically published to their respective marketplaces. Of course, this automation takes some upfront investment, but deploying manually is again a source of error and, in the long run, automating once is faster than deploying manually. Especially when the deployment happens weekly, daily or in our case: whenever a change gets merged.

DRYing up your code base

Finally, when looking at the code base of a project, there are sources of redundancy that can be omitted, which go further than just using methods and inheritance. This can be achieved by making use of code generation.

Code generation can be applied whenever we find ourselves typing the same code over and over. A typical target for code generation is the code written for forms in a web-frontend. Take a look at our blog post on How to auto-generate advanced forms using Formly, that outlines how forms can be auto-generated for Angular frontends.

Another option is code generation for data structures that all have similar methods but can not be refactored with object orientation. In order to generate code in Go, a combination of templating and the package that we also use to write our linters can be used: Go AST package.

In order to test the Symflower product we generate test fixtures, which encode the results of our analysis. Adding these fixtures to version control allows us to find behavior changes without writing a dedicated test. The cost here is that simple changes to our analysis can result in large changes to the fixtures, but that’s not a problem because we can always regenerate them.

Lastly, you can consider using a tool such as Symflower to autonomously generate Java unit tests as well as Go unit tests.

Wrapping up: using the DRY principle in software development

Of course, not all the steps we described in this article, from “DRYing up the environment” to “code generation”, were initially present at the start of the Symflower project. They were added step by step as we realized that each bit of automation would pay off eventually. A good point to figure out whether automating a task pays off is the XKCD Is it worth the time.

| 2022-04-29