Ever since around 2009-2010, developers have been engaging in an increasingly vocal debate about version control systems. I attribute this to the hugely popular rise of the distributed version control systems (DVCSs), namely Git and Mercurial. From my understanding, DVCSs in general are more powerful than nondistributed VCSs, because DVCSs can act just like nondistributed VCSs, but not vice versa. So, ultimately, all DVCSs give you extra flexibility, if you want it.
For various reasons, there is still an ongoing debate as to why one should use, for example, Git over Subversion (SVN). I will not address why there are still adamant SVN (or, gasp, CVS) users in the face of the rising tsunami tidal wave of DVCS adherence. Instead, I will talk about things I like about Git, because I’ve been using it almost daily for nearly three years now. My intention is not to add more flames to the ongoing debate, but to give the curious, version control virgins out there (these people do exist!) a brief rundown of why I like using Git. Hopefully, this post will help them ask the right questions before choosing a VCS to roll out in their own machines.
1. Git detects corrupt data.
Git uses an internal data structure to keep track of the repo. These objects, which are highly optimized data structures called blobs, are hashed with the SHA-1 algorithm. If suddenly a single byte gets corrupt (e.g., mechanical disk failure), Git will know immediately. And, in turn, you will know immediately.
Check out this quote from Linus Torvalds’ Git talk back in 2007:
“If you have disc corruption, if you have RAM corruption, if you have any kind of problems at all, git will notice them. It’s not a question of if. It’s a guarantee. You can have people who try to be malicious. They won’t succeed. You need to know exactly 20 bytes, you need to know 160-bit SHA-1 name of the top of your tree, and if you know that, you can trust your tree, all the way down, the whole history. You can have 10 years of history, you can have 100,000 files, you can have millions of revisions, and you can trust every single piece of it. Because git is so reliable and all the basic data structures are really really simple. And we check checksums. And we don’t check some UDP packet checksums that is a 16-bit sum of all the bytes. We check checksums that is considered cryptographically secure.
[I]t’s really about the ability to trust your data. I guarantee you, if you put your data in git, you can trust the fact that five years later, after it is converted from your harddisc to DVD to whatever new technology and you copied it along, five years later you can verify the data you get back out is the exact same data you put in. And that is something you really should look for in a source code management system.”
(BTW, Torvalds, opinionated as he is, has a very high signal-to-noise ratio and I highly recommend all of his talks.)
2. It’s distributed.
Because it is based on a distributed model of development, merging is easy. In fact, it is automatic, if there are no conflicting changes between the two commits to be merged. In practice, merge conflicts only occur as a result of poor planning. Sloppy developers, beware!
Another benefit of its distributed model is that it naturally lends itself to the task of backing up content across multiple machines.
3. It’s fast.
I can ask Git if any tracked files in a repo have been edited/changed with just one command: git diff. And it needs but a split second, even if my $PWD is not the repo’s root directory or if there are hundreds and thousands of tracked files littered across everywhere (because Git doesn’t think in terms of files, remember?)
4. It gives me surgical precision before and after committing changes.
Several things help me keep my commits small, and sane. The biggest factor is the index concept. Apparently, Git is the only VCS that gives you this tool! After editing your files, you go back and select only those chunks you want to be in the commit with git add -p. This way, you are free to change whatever you think is necessary in your files, without any nagging idea in the back of your mind going, “Hey, is this change exactly what you need/intended for your next commit?”
The other big factor is the rebase command. With rebase, I can do pretty much anything I want with my existing commits. I can reorder them. I can change their commit messages (known as amending). I can change the commits themselves (i.e., change the diffs). I can change 4 tiny commits into a single commit (known as squashing). I can even delete a commit (as long as the later commits do not rely on it). Essentially, you can rewrite your commits in any you like. This way, you can sanitize your commits in a logical way, regardless of work history.
I could go on, but the remaining points don’t have as much “oomph” as the ones listed already. I fear that I am unable to see many of the “problems” with Git’s methodology and workflow, because I had the (un?)fortunate experience of learning Git as my first and only VCS. I learned concepts like the index, rebasing, committing, amending, branching, merging, pulling, and pushing all for the first time from Git. I also learned how to use Git by typing the core Git commands into a terminal (since I’m in there all the time anyway), so I have not been biased in favor of GUI-only operation (these days, tig is the only GUI tool I use — and only as a brief sanity check at that). Then again, I’ve never suffered data corruption, lost branches, or anything like that, so I’m probably doing things the right way in this whole VCS thingamajig.
Oh, and here are some aliases I use for Git:
alias g='git' alias gdf="[[ \$(git diff | wc -l) -gt 0 ]] && git diff || echo No changes" alias gdfc="[[ \$(git diff --cached | wc -l) -gt 0 ]] && git diff --cached || echo Index empty" alias gst='git status' alias gbr='git branch' alias gcm='git commit' alias gco='git checkout' alias glg='git log' alias gpl='git pull' alias gps='git push'