Software Teams Should Require Two Peer Reviews For Code Merges
TL;DR: Analysis of 70 software peer reviews reveals that the first code reviewer contributed important feedback to 100% of code reviews, the second reviewer added important feedback to 65 (93%) of code reviews, but the third reviewer added significant feedback to only 3 (4%) reviews. Code reviews are an important part of the software development process, and for optimal returns on time invested, teams should use two approving reviewers per code merge.
Peer code reviews are a standard practice in software engineering, supported through lightweight (asynchronous) processes, such as GitHub’s pull requests (PRs). Despite wide adoption, the return on investment of peer reviews is not widely understood. In my discussions with hundreds of engineers, I discovered that most didn’t know why their team required one, or two reviews per PR, and whether increasing the number of reviewers would be a good time investment. Back in 2013, while working as a software engineer at Amazon, I attempted to answer this question for my team. Initially I researched various studies, articles, and books, but after failing to find metrics that would answer the question conclusively, I did a quick analysis of my team’s code reviews. Below is the summary of my findings.
Code reviews belonged to a team of eight software development engineers (SDEs), building services using Java. Team maintained ~85% unit test coverage, and a ~100% integration test coverage of all major functionality. We used continuous integration, and ran our tests before raising code reviews. Team had two peculiarities which enabled me to find a rich data set for this analysis. The first was that team’s seniority was relatively high compared to other teams, as we had one junior, six mid-level, and one senior engineer. Typical team composition at that time was to have ~50% of junior SDEs on a team. The second peculiarity was that, as a group, we had a strong bias for code reviews, with a median count of three reviewers per PR. This is unusual because, based on my discussions with dozens of engineering managers, most teams do one to two peer reviews per PR, but no other team I’ve come across has done three reviews as a general course of practice.
Seventy code reviews were selected based on the following criteria:
1. PR had 3 approving reviewers.
2. PR had 50 to 400 changed lines of code.
3. PR included business logic changes (no style, or config only changes).
In examining of code comments, I tried to focus on the ones which pointed to significant issues, which team would not readily take on as technical debt. This is a translation of the filter I applied to identify “significant comments”:
1. Logic errors that result in unexpected behavior (ie: null pointer exception or not satisfying a business requirement).
2. Significant architectural issue requiring code reorganization (ie: to remove duplication of code), or re-design to address issues with implementation approach (ie: a non-scalable solution).
Results of Analysis
70 (100%) of PRs had at least one significant comment.
65 (93%) of PRs had a significant comment from the 2nd reviewer.
3 (4%) of PRs had a significant comment from the 3rd reviewer.
Not surprisingly, due to complexity of selected PRs, all 70 reviews contained at least one important comment. Surprisingly, 93% of the time, a second reviewer (based on the timestamp on the feedback), contributed at least one important comment that was missed by the first reviewer. The third reviewer seldomly provide additional value to the review.
Despite having robust testing coverage, our team was regularly finding merge-blocking defects through the code review process. Based on the analysis of data, two code reviews is the optimal number of reviews technology teams should enforce in their code review process.
Several years after I completed this analysis, I discovered other studies which provided further insight. Convergent Contemporary Software Peer Review Practices paper in particular summarized several studies and converged on the idea that “two reviewers find an optimal number of defects.”
If your team is currently enforcing a one-ship-it code review policy, consider changing this practice to add a second reviewer as an efficient way to increase the quality of your output. Be careful when you tinker with the code review process however, as this will likely increase the review turnaround time, and may create frustration with developers. Consider coupling this change with additional process improvements to achieve the optimal code review experience (good reference here is Google’s Engineering Practice).