If you read my article on why rewriting applications from scratch is almost always a bad idea and agreed with it, this article acts as a short guideline on how should one approach refactoring legacy applications without hindering their ability to release new features.
Find the biggest pain point.
Look at the codebase and ask yourself what is the one thing that, if improved, would make everything else much easier to deal with?
You probably already know what needs to be refactored. It gets in your way all the time. It’s that class that keeps on changing, or that method hoarding conditional after conditional, growing into a big bowl of spaghetti. That’s where you should start.
Avoid focusing on auxiliary things, even if they are easy wins. Decluttering the registration and authentication process, or how a user updates their profile just isn’t that important. They are unlikely to change for the foreseeable future. Ignore them. Focus on what’s changing now.
Yes, yes. I know. It’s the same old, boring refactoring quote by Kent Beck you see everywhere. But there’s a good reason it circled around for so long. It works.
The biggest point might be too big.
In the past, whenever I started refactoring something, I would pretty much always end up going on a refactoring spree, touching half of the codebase in one go. Avoid this. The chances of getting away without breaking anything is close to zero – something terrible is bound to happen when you fail to keep your refactoring scope small. Not only that, but your colleagues will also in no way appreciate your 30+ files pull requests. 🙂
One way to avoid going into a refactoring spree is to take the biggest pain point and split it into smaller, manageable pain points and move up from there.
Make use of static code analysis.
Run a static code analysis tool to get some things out of the way right from the start. There are a lot of great CLI tools out there, but I recommend going with either an IDE plugin (sonarlint) or something more robust like sonarqube.
Running a static code analysis tool allows you to quickly identify and safely refactor things like: unreachable code, unused variables, commented code, redundant variables and jumps, and more.
It won’t help you fix everything, but you’ll be off to a good start.
Writing your test suite
While a plugin like sonarlint will provide some useful suggestions on how to clean up the code a bit, from now on, every change you make should have a test backing it.
Treat your application like a black box. Focus on what gets in and what gets out. Unit tests are sexy and all, but not helpful and possible enough when you are dealing with an untested legacy application where everything is tightly coupled together. Write tests that prove the system works as a whole.
Unless you know each and every requirement, it’s a bad idea to write tests randomly, without any strategy whatsoever – it makes it that more likely to miss something and break the application. You can’t afford to test every execution path either, as it would be incredibly time-consuming.
What I like to do is go with the next best thing, which is cyclomatic complexity based testing.
Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitive measure of the number of linearly independent paths through a program’s source code.Wikipedia
While cyclomatic complexity is a useful metric to keep an eye on to maintain your code to a reasonable level of complexity, it can also be helpful when figuring out what tests to write to make sure every line of code is covered. This strategy is especially useful when dealing with methods hard to follow, with lots of conditionals and neverending nesting levels.
Following this approach ensures our tests cover every line of code, but that doesn’t necessarily mean everything works as it should be. Still, it goes a long way in that direction. Here’s a video explaining how it works.
Force yourself to ignore the weird stuff.
Sometimes when writing tests for a legacy application, you will stumble upon some weird cases. Execution paths that might happen but make no sense, or paths that should happen but they never do. Resist the temptation to do anything about them. Keep your eyes on the ball and finish what you started. Test the code as it is.
While writing those tests, you will gain a significant amount of inside knowledge on how the code works. Once your test suite is ready, take a step back and see if everything makes sense. Ask a knowledgeable product owner to help you clear any doubts you might have.
The refactoring phase
Now here, there isn’t much one can say here. Refactor the code in whatever way it makes more sense to you and your team. Y’all know the SOLID principles. Y’all know the patterns. Use them.
The only advice I can confidently give is, once you are done refactoring, log the shit out of everything. Storing logs of what goes in and what goes out is the minimum you should be doing. Log everything that could help you repair things in case something was missed during the refactoring process.
Pick important battles, but pick battles you can win. Build on top of them and clean your application one small refactoring after another. The path from an old, untested legacy codebase to one that you can live with is long and strenuous, but not impossible.