Things I Like About Git

Ever since around 2009-2010, developers have been engaging in an increasingly vocal debate about version control systems. I attribute this to the hugely popular rise of the distributed version control systems (DVCSs), namely Git and Mercurial. From my understanding, DVCSs in general are more powerful than nondistributed VCSs, because DVCSs can act just like nondistributed VCSs, but not vice versa. So, ultimately, all DVCSs give you extra flexibility, if you want it.

For various reasons, there is still an ongoing debate as to why one should use, for example, Git over Subversion (SVN). I will not address why there are still adamant SVN (or, gasp, CVS) users in the face of the rising tsunami tidal wave of DVCS adherence. Instead, I will talk about things I like about Git, because I’ve been using it almost daily for nearly three years now. My intention is not to add more flames to the ongoing debate, but to give the curious, version control virgins out there (these people do exist!) a brief rundown of why I like using Git. Hopefully, this post will help them ask the right questions before choosing a VCS to roll out in their own machines.

1. Git detects corrupt data.

Git uses an internal data structure to keep track of the repo. These objects, which are highly optimized data structures called blobs, are hashed with the SHA-1 algorithm. If suddenly a single byte gets corrupt (e.g., mechanical disk failure), Git will know immediately. And, in turn, you will know immediately.

Check out this quote from Linus Torvalds’ Git talk back in 2007:

“If you have disc corruption, if you have RAM corruption, if you have any kind of problems at all, git will notice them. It’s not a question of if. It’s a guarantee. You can have people who try to be malicious. They won’t succeed. You need to know exactly 20 bytes, you need to know 160-bit SHA-1 name of the top of your tree, and if you know that, you can trust your tree, all the way down, the whole history. You can have 10 years of history, you can have 100,000 files, you can have millions of revisions, and you can trust every single piece of it. Because git is so reliable and all the basic data structures are really really simple. And we check checksums. And we don’t check some UDP packet checksums that is a 16-bit sum of all the bytes. We check checksums that is considered cryptographically secure.

[I]t’s really about the ability to trust your data. I guarantee you, if you put your data in git, you can trust the fact that five years later, after it is converted from your harddisc to DVD to whatever new technology and you copied it along, five years later you can verify the data you get back out is the exact same data you put in. And that is something you really should look for in a source code management system.”

(BTW, Torvalds, opinionated as he is, has a very high signal-to-noise ratio and I highly recommend all of his talks.)

2. It’s distributed.

Because it is based on a distributed model of development, merging is easy. In fact, it is automatic, if there are no conflicting changes between the two commits to be merged. In practice, merge conflicts only occur as a result of poor planning. Sloppy developers, beware!

Another benefit of its distributed model is that it naturally lends itself to the task of backing up content across multiple machines.

3. It’s fast.

I can ask Git if any tracked files in a repo have been edited/changed with just one command: git diff. And it needs but a split second, even if my $PWD is not the repo’s root directory or if there are hundreds and thousands of tracked files littered across everywhere (because Git doesn’t think in terms of files, remember?)

4. It gives me surgical precision before and after committing changes.

Several things help me keep my commits small, and sane. The biggest factor is the index concept. Apparently, Git is the only VCS that gives you this tool! After editing your files, you go back and select only those chunks you want to be in the commit with git add -p. This way, you are free to change whatever you think is necessary in your files, without any nagging idea in the back of your mind going, “Hey, is this change exactly what you need/intended for your next commit?”

The other big factor is the rebase command. With rebase, I can do pretty much anything I want with my existing commits. I can reorder them. I can change their commit messages (known as amending). I can change the commits themselves (i.e., change the diffs). I can change 4 tiny commits into a single commit (known as squashing). I can even delete a commit (as long as the later commits do not rely on it). Essentially, you can rewrite your commits in any you like. This way, you can sanitize your commits in a logical way, regardless of work history.

Other Thoughts

I could go on, but the remaining points don’t have as much “oomph” as the ones listed already. I fear that I am unable to see many of the “problems” with Git’s methodology and workflow, because I had the (un?)fortunate experience of learning Git as my first and only VCS. I learned concepts like the index, rebasing, committing, amending, branching, merging, pulling, and pushing all for the first time from Git. I also learned how to use Git by typing the core Git commands into a terminal (since I’m in there all the time anyway), so I have not been biased in favor of GUI-only operation (these days, tig is the only GUI tool I use — and only as a brief sanity check at that). Then again, I’ve never suffered data corruption, lost branches, or anything like that, so I’m probably doing things the right way in this whole VCS thingamajig.

Oh, and here are some aliases I use for Git:

alias g='git'
alias gdf="[[ \$(git diff | wc -l) -gt 0 ]] && git diff || echo No changes"
alias gdfc="[[ \$(git diff --cached | wc -l) -gt 0 ]] && git diff --cached || echo Index empty"
alias gst='git status'
alias gbr='git branch'
alias gcm='git commit'
alias gco='git checkout'
alias glg='git log'
alias gpl='git pull'
alias gps='git push'

Unified Configuration File Setup Across Multiple Machines – Revisited

My last post discussed how to get your config files, known as “dotfiles,” synchronized across multiple machines using a rudimentary makefile and git. I said that I hoped of achieving the “one folder for all config files” dream. I have achieved it, and it is pretty simple. Also, I will discuss how to handle files with passwords in them, and some other thoughts on this setup.

Keep track of system files, too

Essentially, my last post left out all but the system configuration files, such as /etc/fstab and the like. The /etc folder is owned by root, as well as the /boot folder. My first approach was to simply replace all such files with more symlinks, which would point to files owned by the normal user of the system. This approach had its drawbacks: (1) not all system files are symlinkable (e.g., /etc/sudoers is particularly security-conscious), and (2) the idea of deleting system files and replacing them with symlinks, on its face, sounded like I was setting myself up for a big grand screw-up.

So I thought: “Well, since system files are seldom ever edited anyway, why not just back them up periodically?” And that’s what I did instead. Now my makefile, as discussed in the previous post, has a section like this:

# copy contents of system files to keep track of them
syscopy:
ifeq ('$(HOSTNAME)','exelion')
	cat /boot/grub/menu.lst >       /home/shinobu/syscfg/sys/boot-grub-menu.lst-exelion
	cat /etc/X11/xorg.conf >        /home/shinobu/syscfg/sys/etc-X11-xorg.conf-exelion
	cat /etc/fstab >                /home/shinobu/syscfg/sys/etc-fstab-exelion
	cat /etc/hosts >                /home/shinobu/syscfg/sys/etc-hosts-exelion
	cat /etc/inittab >              /home/shinobu/syscfg/sys/etc-inittab-exelion
	cat /etc/makepkg.conf >         /home/shinobu/syscfg/sys/etc-makepkg.conf-exelion
	cat /etc/rc.conf >              /home/shinobu/syscfg/sys/etc-rc.conf-exelion
	cat /etc/rc.local >             /home/shinobu/syscfg/sys/etc-rc.local-exelion
	cat /etc/rc.local.shutdown >    /home/shinobu/syscfg/sys/etc-rc.local.shutdown-exelion
	cat /etc/yaourtrc >             /home/shinobu/syscfg/sys/etc-yaourtrc-exelion
	cat /etc/sudoers >              /home/shinobu/syscfg/sys/etc-sudoers-exelion # requires superuser privileges to read!
else
	cat /boot/grub/menu.lst >       /home/shinobu2/syscfg/sys/boot-grub-menu.lst-luxion
	cat /etc/X11/xorg.conf >        /home/shinobu2/syscfg/sys/etc-X11-xorg.conf-luxion
	cat /etc/fstab >                /home/shinobu2/syscfg/sys/etc-fstab-luxion
	cat /etc/hosts >                /home/shinobu2/syscfg/sys/etc-hosts-luxion
	cat /etc/inittab >              /home/shinobu2/syscfg/sys/etc-inittab-luxion
	cat /etc/makepkg.conf >         /home/shinobu2/syscfg/sys/etc-makepkg.conf-luxion
	cat /etc/network.d/luxion-wired > /home/shinobu2/syscfg/sys/etc-network.d-luxion-wired
	cat /etc/network.d/luxion-wireless-home-nopassword > /home/shinobu2/syscfg/sys/etc-network.d-luxion-wireless-home-nopassword
	cat /etc/rc.conf >              /home/shinobu2/syscfg/sys/etc-rc.conf-luxion
	cat /etc/rc.local >             /home/shinobu2/syscfg/sys/etc-rc.local-luxion
	cat /etc/rc.local.shutdown >    /home/shinobu2/syscfg/sys/etc-rc.local.shutdown-luxion
	cat /etc/yaourtrc >             /home/shinobu2/syscfg/sys/etc-yaourtrc-luxion
	cat /etc/sudoers >              /home/shinobu2/syscfg/sys/etc-sudoers-luxion
endif

I have in my /etc/rc.local the command “make -f /path/to/the/above/makefile -B syscopy“. So every time my system boots up, all of the config files are copied into their backup-equivalents in the syscfg/sys folder. Since git tracks changes in the syscfg folder, only changes in the config files are detected and tracked as changes (i.e., git doesn’t track changes in file modification times, which is a good thing here for our purposes — otherwise git would be saying that every time we boot up all of our system files have changed!). So now all of my system config files are tracked passively (by merely reading off them). Of course, if I manually edit a system file, I can still call make -B syscopy myself manually, and then run git diff in the syscfg folder to track those changes, and then git commit to solidify those changes into the git history.

For config files with passwords in them

DO NOT EVER PUT CONFIG FILES WITH PASSWORDS INTO YOUR TRACKED DOTFILES FOLDER! Not only does this mean that your password, in plain text, is tracked by git, but that should you ever change your password, git will notice the changes and track them as well! This will give anyone who gets access to your git repo a complete, timestamped history of your passwords for your applications (like icecast, irssi, etc.) So to get around this problem, I have it set up so that I have a copy of the password-containing config file, minus the passwords in them. Whenever I make changes to the original password-containing files, I update the changes into the copies, and then track these copies in git, not the originals.

Not all config files need symlinks

In my last post, I discussed how creating symlinks via commands in the makefile was the key to this whole setup. But for some (smarter) applications, symlinks are not needed, since they can intelligently be told which config file to use. Alpine, icecast, irssi, and mpd are like this, so I just have config files for them inside my syscfg folder, and just run these apps (which are all autostarted for me each time on boot) with commandline parameters pointing to the non-default config file locations.