published on January 27, 2018
tags: programming process research

What Research Says About Code Review

Recently I’ve been part of more code reviews, and part of an increasing number of conversations about them. I’ve started to wonder: What impact do code reviews really have? What makes for an effective code review? I certainly feel like it results in improved code, and obviously that’s the opinion of the development community at large as well. However, what something feels like and what its effects really are can be quite different, so I looked for empirical studies of code review - and found some very interesting results.

What does code review accomplish?

It’s crucial to understand what people expect out of code reviews. Whether or not code reviews are valuable, if the value they provide doesn’t match what people expect from them, they may be perceived as not valuable.

Microsoft and the University of Lugano studied the expectations and outcomes¹ of code review. The clear top two reasons given for code reviews are finding defects and code improvement followed closely by finding alternative solutions. Most of the remainder of reasons given include some form of spreading knowledge, whether within a team or across teams.

Finding defects is relatively obvious - this means finding defects before they are found via manual testing. Code improvement is, in this case, defined as making sure the code follows team standards, improving readability, adding or changing comments, and changes which improve the “quality” of the code without actually making logic changes. Alternative solutions refers to suggestions of other, ideally better, implementations and architectural designs.

But what does it actually accomplish, and how well does that match up with what we want out of it? That’s answered not only in the paper linked above but also by some further studies² ³ from other sources. There are a few interesting results:

Code reviews don’t result in as many defect fixes as expected.
Code improvement is the biggest outcome of code reviews.
Knowledge transfer is much more important than we expect.

While code reviews do find some defects, and reviewed code is significantly less likely to have defects than unreviewed code³, review comments that result in defect fixes aren’t the most common. However, knowledge transfer is much more important than we give it credit for: Intuitively, code modified by developers who had modified that same code before had a lower incidence of defects. Counterintuitively, the same is true of developers who have previously reviewed changes to that code².

Additionally, code reviews result in significant improvements to the readability and subjective “quality” of the code being reviewed³. That said, they rarely result in major architectural changes, at least over the short term studied by the papers mentioned above - it is possible that more widely distributed knowledge of the code influences long-term architectural decisions, although I haven’t been able to find if this has been studied.

What makes for a good code review?

Now that we know what to expect from code reviews, how can we get the best results out of them? That’s also been studied at Microsoft⁴, as well as touched on in some of the previously mentioned papers. Here are some of the most important points.

If a reviewer has experience (any experience!) with the code in question, the more likely they are to make useful comments. The difference between “no experience” and “some experience” is the biggest³ ⁴ - how much experience has relatively little impact. A reviewer who has made changes to the code under review once is nearly as effective at finding defects and making useful comments as a developer who has changed that code ten times, but almost twice as effective as someone who has never seen the code before. Even having previously reviewed code makes a reviewer more effective at future reviews on that code.

Small code reviews are better than large ones - the larger the number of files included in the review, the less likely the review was to produce useful comments⁴. Small, focused sets of changes are easier to understand, and therefore easier to spot problems in. That’s not to say that large changes are not worth reviewing, but it may be more effective to break them up into smaller pieces to review, or to remove less relevant changes from the review in order to reduce the overall size.

Finally, although most code review comments have a negative tone, those which are less negative are more likely to be useful. In other words, do your best to keep it positive while still making suggestions for improvements. In my opinion, this is a good policy to follow anyway - staying positive and professional helps everyone do their best work.

Takeaways (tl;dr)

While code reviews may find defects, their biggest effects are in spreading knowledge of code and making code more readable.
Keep code reviews small in terms of the number changed files.
Include at least one person who has previously changed or reviewed the code in question.
Keep your review comments as positive as you can while still being constructive.

Lastly, if this information sounds interesting, I encourage you to read the papers linked below! They’re all well-written and fairly easy to understand, (although “Revisiting Code Ownership…” has some bits I admit to skimming regarding exactly how they did their analysis) and there are some further points that I haven’t covered here that may be useful to you.