At Bitbucket (and throughout Atlassian) we are constantly dogfooding our own products. This helps us flesh out requirements and find bugs. So, naturally, we host the Bitbucket code in Bitbucket as well. We use pull requests to review team members’ code before merging it in and deploying. And, like many of you, we have been wanting inline comments on pull requests for a long time.
As we developed the inline comment feature for pull requests and commits, we discovered a couple of problems while dogfooding. The first was related to the comment drift algorithm, and the second related to performance. Keep reading to learn how we solved these problems!
When you are doing code reviews, the ability to leave comments on a particular line is not a new feature. Atlassian Crucible has had inline comments in reviews since 2007, and I’m sure there have been other examples before that as well.
The lack of inline comments was the primary complaint from other Atlassian developers when they moved their projects to Bitbucket. Many others held off on moving their projects to Bitbucket for this reason as well. We at Bitbucket felt the pain of not having inline comments every day. But we also had a pretty good idea of how we wanted it to work, from using Crucible reviews for years. We knew what we liked about Crucible, and what we wanted to do differently.
We thought a lot about Stash, which you can think of as Bitbucket for the Enterprise offering Git repository management behind the firewall. It is a great option for larger teams that have outgrown cloud code hosting services like Bitbucket, and want to bring their repositories inside of their enterprise infrastructure. While Bitbucket and Stash serve slightly different audiences, we wanted to be sure you could move between the two products without a steep learning curve. The new UI was one area that we wanted to make consistent, but we also wanted to be sure commenting on pull requests worked consistently.
To that end, Nic Venegas and I traveled to Sydney to work closely with the Stash developers for a few weeks. This trip helped us immensely learn the intricacies of each others’ products and achieve a level of consistency between the two products. Working with the Stash team naturally made us think more about the problems that we were trying to solve from a different angle, and helped us build a more robust product.
For example, here are some specific things we aligned on:
- Tabbed view on the pull request (there are some differences on exactly what is in each tab)
- Participants list (though Stash calls it ‘Reviewers’)
- ‘Approve’ functionality
- Comment drift algorithm (see the next section for more details)
High Plains Drifter
Probably the most complex part about inline comments on pull requests was handling ‘drift’ correctly. Drift (as we call it; I don’t know if there is a standardized word for it) is when additional commits amend a pull request after an initial comment is left on a line. This might change the line number of the commented-on line, and, of course, even if the line number changes, the comment should still appear in the right place.
Other times, new changes might change or remove a commented-on line, so the commented-on line no longer exists in the diff (at least, not as it was when the comment was made). We refer to these comments as ‘eclipsed’, since the new changes cover the line they were anchored to. We still show those comments in the Activity tab, with the diff as it was before the comment was eclipsed.
If keeping all this drift stuff straight sounds complicated, that’s because it is. We caught a few bugs that annoyed us EVERY DAY while we used the feature we were developing. One particularly nasty bug that caused the drift to be calculated incorrectly when the destination branch was merged back into the feature branch (or fork) after a comment was made. This is a pretty common thing to do, especially if conflicts arise in the pull request. The bug was tricky to resolve because we needed to re-think the way we were performing the diffs used to calculate drift. Instead of just relying on the diffs of the commits made, we needed to take the diff of the merges that would be applied if you were to merge the pull request. Big thanks to the Stash team for helping us figure out the cause of, and solution to, this problem. We caught this bug because we were bitten by it while using the feature we were developing.
The solution to the drift problem can be explained using the diagram above. If you aren’t interested in the nitty-gritty details, you can just skip ahead to the next section.
The first case to consider is when the source branch advances, which means that more changes are committed to the source branch of the pull request, and the pull request is edited to include these changes. Consider the diagram on the left. The original version of the pull request is to merge commit D into the main branch (which includes commits A, B, and C). The merge originally shown was M1. Now, commit E has been added to the pull request, and merge M2 is what should be shown.
The easy case, which we call the ‘fast-forward’ case, is when the comments are on files that are not touched by commit E. Since there is no further change to these files, their line numbers are unchanged, so we just need to update the comment objects in our database to confirm that they are relevant to the new revision anchors (E and C, in this case).
However, when the files commented on are touched by commit E, things get trickier. For comments on removed lines (colored red on the pull request view) and context lines (lines that were not changed at all in the pull request, but are shown to give the reviewer context), we can just fast-forward them, since these lines are coming from the main branch version of the file, not the new version described in the pull request. On the other hand, for comments on ‘added’ lines (colored green on the pull request view), we need to consider the ‘meta-diff’, or the diff between merges M1 and M2. For each hunk in this meta-diff, if it is before the line commented on, the system will drift the comment by net total of how many lines were added or removed in that hunk. If the hunk overlaps the commented line, then the comment is eclipsed. If the hunk is after the commented line, then it has no affect on the comment.
The next case to consider is when the destination branch advances (the right side of the diagram above). Suppose that now a new commit, F, was pushed to the destination branch. We can still fast-forward any comments that were not touched by commit F. The meta-diff between merges M3 and M4 is still used to calculate drift on comments on added lines. However, now we need to consider the diff represented by commit F to calculate the drift for comments on removed and context lines.
The bug that we encountered originally occurred in a case like the one pictured on the left; we were using the new commit’s diff instead of the meta-diff. This worked fine, until we merged the default branch back into our feature branch and updated the Pull Request (as you would to resolve conflicts, for example). This would be a case of the source branch advancing, but the diff from commit E wouldn’t properly express the drift.
Speed Matters
We are constantly striving to improve the speed and performance of pull requests (and all of Bitbucket). In this process, we have made significant progress (while we were still dogfooding the feature internally). Our dogfooding server mirrors our production environment, but it is not nearly as powerful. So, code that will be slow in production will be REALLY slow there. As a result, we feel the pain of a slow site, and we work to improve it, long before the code ever makes it to production.
With all of the various diff comparisons we needed to do to render a pull request (especially on the activity tab, if there were a lot of eclipsed comments), there was a lot of work for the code to do. We ran into a few performance issues while we were developing this feature, and some of these issues were improved by code changes, while others were solved by adding caching. Some other improvements are still in the works, and should hit production in the next few days.
Specifically, we:
- Reduced the number of SQL queries needed to render the page by making sure we used Django’s select_related function wherever appropriate.
- Simplified logic used to build up the activity tab events that were displayed when commits were added to the pull request. Before we simplified this logic, it generated many more diffs than were necessary, so fixing that improved the performance immensely.
- Added template caching to activity tab items. This improvement adds to the complexity of invalidating the cache when appropriate, but it can make the page load quickly.
- Separated out the slow-loading part the main diff from quickly-loading parts, like general comments. This gets something onto your screen quickly, even if the time to getting all the content to your screen is unchanged.
Increase the Quality of Your Code Today
With all this dogfooding, it might seem like we were just trying to solve our own problem, since on our team, we constantly use pull requests to review all code changes. Our team policy is that at least two team members need to ‘approve’ a pull request before we merge the changes in. That seems to work pretty well for us, but your team might be different. We wanted to make our solution useful to as many teams as possible. Many smaller teams find pull requests too heavy-weight, and prefer to just review commits instead. So, we made sure to include inline comments on commits as well. Hopefully this allows enough flexibility in the tool that your team can find a system that works.
So, check it out! Try browsing a few pull requests that are out there now, and get a feel for them. Create a pull request the next time you want to merge your code into your team’s main branch, and get a few extra pairs of eyes on it. I’m sure you will be able to find problems sooner, and fix them more easily!