Things I Like About Git

Ever since around 2009-2010, developers have been engaging in an increasingly vocal debate about version control systems. I attribute this to the hugely popular rise of the distributed version control systems (DVCSs), namely Git and Mercurial. From my understanding, DVCSs in general are more powerful than nondistributed VCSs, because DVCSs can act just like nondistributed VCSs, but not vice versa. So, ultimately, all DVCSs give you extra flexibility, if you want it.

For various reasons, there is still an ongoing debate as to why one should use, for example, Git over Subversion (SVN). I will not address why there are still adamant SVN (or, gasp, CVS) users in the face of the rising tsunami tidal wave of DVCS adherence. Instead, I will talk about things I like about Git, because I’ve been using it almost daily for nearly three years now. My intention is not to add more flames to the ongoing debate, but to give the curious, version control virgins out there (these people do exist!) a brief rundown of why I like using Git. Hopefully, this post will help them ask the right questions before choosing a VCS to roll out in their own machines.

1. Git detects corrupt data.

Git uses an internal data structure to keep track of the repo. These objects, which are highly optimized data structures called blobs, are hashed with the SHA-1 algorithm. If suddenly a single byte gets corrupt (e.g., mechanical disk failure), Git will know immediately. And, in turn, you will know immediately.

Check out this quote from Linus Torvalds’ Git talk back in 2007:

“If you have disc corruption, if you have RAM corruption, if you have any kind of problems at all, git will notice them. It’s not a question of if. It’s a guarantee. You can have people who try to be malicious. They won’t succeed. You need to know exactly 20 bytes, you need to know 160-bit SHA-1 name of the top of your tree, and if you know that, you can trust your tree, all the way down, the whole history. You can have 10 years of history, you can have 100,000 files, you can have millions of revisions, and you can trust every single piece of it. Because git is so reliable and all the basic data structures are really really simple. And we check checksums. And we don’t check some UDP packet checksums that is a 16-bit sum of all the bytes. We check checksums that is considered cryptographically secure.

[I]t’s really about the ability to trust your data. I guarantee you, if you put your data in git, you can trust the fact that five years later, after it is converted from your harddisc to DVD to whatever new technology and you copied it along, five years later you can verify the data you get back out is the exact same data you put in. And that is something you really should look for in a source code management system.”

(BTW, Torvalds, opinionated as he is, has a very high signal-to-noise ratio and I highly recommend all of his talks.)

2. It’s distributed.

Because it is based on a distributed model of development, merging is easy. In fact, it is automatic, if there are no conflicting changes between the two commits to be merged. In practice, merge conflicts only occur as a result of poor planning. Sloppy developers, beware!

Another benefit of its distributed model is that it naturally lends itself to the task of backing up content across multiple machines.

3. It’s fast.

I can ask Git if any tracked files in a repo have been edited/changed with just one command: git diff. And it needs but a split second, even if my $PWD is not the repo’s root directory or if there are hundreds and thousands of tracked files littered across everywhere (because Git doesn’t think in terms of files, remember?)

4. It gives me surgical precision before and after committing changes.

Several things help me keep my commits small, and sane. The biggest factor is the index concept. Apparently, Git is the only VCS that gives you this tool! After editing your files, you go back and select only those chunks you want to be in the commit with git add -p. This way, you are free to change whatever you think is necessary in your files, without any nagging idea in the back of your mind going, “Hey, is this change exactly what you need/intended for your next commit?”

The other big factor is the rebase command. With rebase, I can do pretty much anything I want with my existing commits. I can reorder them. I can change their commit messages (known as amending). I can change the commits themselves (i.e., change the diffs). I can change 4 tiny commits into a single commit (known as squashing). I can even delete a commit (as long as the later commits do not rely on it). Essentially, you can rewrite your commits in any you like. This way, you can sanitize your commits in a logical way, regardless of work history.

Other Thoughts

I could go on, but the remaining points don’t have as much “oomph” as the ones listed already. I fear that I am unable to see many of the “problems” with Git’s methodology and workflow, because I had the (un?)fortunate experience of learning Git as my first and only VCS. I learned concepts like the index, rebasing, committing, amending, branching, merging, pulling, and pushing all for the first time from Git. I also learned how to use Git by typing the core Git commands into a terminal (since I’m in there all the time anyway), so I have not been biased in favor of GUI-only operation (these days, tig is the only GUI tool I use — and only as a brief sanity check at that). Then again, I’ve never suffered data corruption, lost branches, or anything like that, so I’m probably doing things the right way in this whole VCS thingamajig.

Oh, and here are some aliases I use for Git:

alias g='git'
alias gdf="[[ \$(git diff | wc -l) -gt 0 ]] && git diff || echo No changes"
alias gdfc="[[ \$(git diff --cached | wc -l) -gt 0 ]] && git diff --cached || echo Index empty"
alias gst='git status'
alias gbr='git branch'
alias gcm='git commit'
alias gco='git checkout'
alias glg='git log'
alias gpl='git pull'
alias gps='git push'
Advertisements

Unified Configuration File Setup Across Multiple Machines

SUMMARY: This post shows you how to sync multiple configuration files across multiple hosts with git and a makefile.

INTRODUCTION

If you have 20 different so-called ‘dotfiles’ like me, they can get difficult to keep track of. It can be even more difficult if you have multiple computers that you use often, and if you want them all to be updated to your latest settings.

For myself, I need to keep track of:

  • .gitconfig
  • .zshrc
  • .zsh (folder)
  • .vim (folder)
  • .vimrc
  • .gvimrc
  • .vimperatorrc
  • .Xdefaults
  • .boxes
  • .xmonad (folder)
  • .xmonad/init.sh (custom init script that XMonad is told to call in the startup hook)
  • .xmonad/xmonad.hs
  • shellscripts (folder which has a growing number of custom shell scripts that I like to use every now and then, or at least keep as a reference on both my desktop and laptop)
  • .xinitrc

Of course, this list will grow over time, as I start to learn more things and begin using more programs. What I want to do is (1) copy these files over to any other host that I use/own automatically and sync back to all the other machines any changes that I make to any one particular machine; and (2) have a unified config file structure, with a directory name for the application/setting, and a simple file called ‘cfg’ for the config file (which will be symlinked to what the application thinks is the true, appropriate location of the config file). The great thing about these two issues is that there is a simple, durable solution for these two precise concerns: git and make. So enough blabbering, let’s get to it!

Step 1: Put all config files into a new directory

It doesn’t matter where your new directory (let’s call it syscfg) is located. Move all of your config files that you want to keep track of into this directory. I suggest you rename all of them to fit some kind of unified naming scheme, and take note of what their former names/destinations used to be. For example, I use syscfg/vim to keep all of my vim things in there (instead of the default .vim, including a file called cfg to act as my .vimrc file).

Per-host (host-specific) settings

I highly suggest that you make all of your config files, such as your .xinitrc (or any other script or some sort) file, include per-host specific settings. Otherwise, you will have the same config settings on all of your systems! I.e., you’d want your .xinitrc to have something like:

do-something-universal-here
HOSTNAME=$(hostname)
case "${HOSTNAME}" in
    hostname1)
    do-something-here
    ;;
    hostname2)
    do-something-else-here
    ;;
    *) # catches all other hostnames
    do-something-else-that's-universal-here
    ;;
esac
do-something-else-that's-universal-here

The above syntax is for bash scripts (files that start with a “#!/bin/bash” at the very first line). If you are using zsh, you could also use this syntax for creating aliases that are specific to a certain host (e.g., my laptop doesn’t have 2 hard drives, so it doesn’t have aliases that point to my mount directory).

If your config file is painful to work with in implementing per-host settings (my xmonad.hs file is like this), you can still achieve per-host settings by making 2 config files, so for example cfg-host1 and cfg-host2, and symlink the correct one to .xmonad/xmonad.hs. You determine the correct config file to symlink to in the makefile. Read on.

Step 2: Create a makefile

Your sysconfig should now have a clean, uniform structure for all of your config files. Now, let’s create a makefile so that the program make can install and uninstall the symlinks as necessary. Here’s what my makefile looks like:

CFGROOT := $(shell pwd)
HOSTNAME := $(shell hostname)
all: boxes git shellscripts vim vimperatorrc xdefaults xinitrc xmonad zsh
boxes:
	ln -fs $(CFGROOT)/boxes/cfg ${HOME}/.boxes
git:
	ln -fs $(CFGROOT)/git/cfg ${HOME}/.gitconfig
shellscripts:
	ln -fs $(CFGROOT)/shellscripts ${HOME}/shellscripts
vim:
	ln -fs $(CFGROOT)/vim ${HOME}/.vim
	ln -fs $(CFGROOT)/vim/cfg ${HOME}/.vimrc
	ln -fs $(CFGROOT)/vim/cfg-gui ${HOME}/.gvimrc
vimperatorrc:
	ln -fs $(CFGROOT)/vimperatorrc/cfg ${HOME}/.vimperatorrc
xdefaults:
	ln -fs $(CFGROOT)/xdefaults/cfg ${HOME}/.Xdefaults
xinitrc:
	ln -fs $(CFGROOT)/xinitrc/cfg ${HOME}/.xinitrc
xmonad:
	ln -fs $(CFGROOT)/xmonad ${HOME}/.xmonad
	ln -fs $(CFGROOT)/xmonad/init ${HOME}/.xmonad/init.sh
#--------------------------------------------------------------------------------------------------#
# Since it's really painful to do a unified config file across multiple hosts in XMonad v. 0.8.1,  #
# I have to do it this way.                                                                        #
#--------------------------------------------------------------------------------------------------#
ifeq ('$(HOSTNAME)','exelion')
	ln -fs $(CFGROOT)/xmonad/cfg ${HOME}/.xmonad/xmonad.hs
else
	ln -fs $(CFGROOT)/xmonad/cfg-luxion ${HOME}/.xmonad/xmonad.hs
endif

zsh:
	ln -fs $(CFGROOT)/zsh ${HOME}/.zsh
	ln -fs $(CFGROOT)/zsh/cfg ${HOME}/.zshrc

uninstall:
	rm ${HOME}/.boxes
	rm ${HOME}/.gitconfig
	rm ${HOME}/.vim
	rm ${HOME}/.vimrc
	rm ${HOME}/.gvimrc
	rm ${HOME}/shellscripts
	rm ${HOME}/.vimperatorrc
	rm ${HOME}/.Xdefaults
	rm ${HOME}/.xinitrc
	rm ${HOME}/.xmonad/init.sh
	rm ${HOME}/.xmonad/xmonad.hs
	rm ${HOME}/.xmonad
	rm ${HOME}/.zsh
	rm ${HOME}/.zshrc

This is where symlinks reveal their beauty. From what I know of Windows XP (and my knowledge is very limited because I hate M$ with a passion), you cannot do something like this. Anyway, the above is fairly obvious and straightforward, isn’t it? All this does is create symlinks, and remove them if desired. Since they are symlinks, you can still do something like “vim ~/.vimrc”, and vim will read (assuming again that our directory is syscfg) syscfg/vim/cfg, with all of the pretty syntax highlighting and so on. The -f flag for the ln command simply makes it create the symlink even if the symlink already exists at the destination. See man ln for more info.

If you run make -B all, it will create all the symlinks defined under the keyword all. (The -B flag, for forcing make to run, is required here given our situation with symlinks.) You could in the alternative select only those config files you wish to install; e.g., make -B xmonad for installing only those symlinks for vim’s config files, or make -B vim zsh for vim and zsh’s config files. Lastly, running make uninstall removes all of the symlinks from your system. Experiment to your delight. (Make sure to have the lines with ln and rm start with a TAB character, as make will otherwise throw an error.)

Also, note how the contents in syscfg do not matter at all in how they are named, since it’s really the with its symlinks that takes care of all the proper “dotfile” namings.

Lastly, since these are all symlinks, you can have in your, say, .gvimrc file, a line that says “source ~/.vim/cfg”, and it will still work since the .vim directory is symlinked to your syscfg/vim (i.e., you don’t have to refer to symlinks once you make the symlinks). This is just a long-winded way of saying that using this symlink approach preserves all of your old config file paths from within your config files.

Step 3: Fire up git

Now, fire up git, add your config files, and sync it across all your computers! Use my post here to do this. The only thing to note here for our purposes is that the makefile is stored under syscfg, and that syscfg is where git should be initialized (with git init). Also, only add the config files and any other files that the config files depend on. An example of a file you should NOT want to add to git is any sort of history file, such as zsh’s history file (specified with the HISTFILE option in zsh’s config file — in our case, syscfg/zsh/cfg), since you’d want different session history files for different machines. Another example would be vim’s session files for the session manager plugin. If you add such temporary history files (or any other file that the application automatically makes changes to), you will make git track these changes (very doable, but utterly worthless)! On the other hand, you’d probably want to add simply script-like files that your configs depend on, such as vim’s various plugins (mere .vim text files in the syscfg/vim directory), or even irssi’s perl plugins, if you use irssi (I don’t use IRC on my laptop, hence it’s exclusion from my sample syscfg and makefile above).

With git taking care of the syncing, you now have complete revision history, as well as guaranteed config file integrity across all of your systems. It’s only a matter of cloning, then simply pushing and pulling for all of your config file syncing needs. Personally, I have it set up so that I just do “sl” to ssh into my laptop (sl is aliased to the unbearably long “ssh username@192.168.0.102”; no password since I have ssh set up that way; again, see my post above to do this), then “d sy[TAB]” (“d” is aliased in my syscfg/zsh/cfg to mean “cd”, and I only have one directory starting with “sys” so I can use zsh’s TAB completion to do the rest), and then “gpl” (extremely shortened form for “git pull” — again, see my post on git to make git accept this instead of “git pull origin master”). Yes, all I do is sl, d sy[TAB], gpl, and my laptop is synced. No more checking/rechecking manually whether certain symlinks exist on my laptop, and whether certain vim plugins already exist on it. I wish I had thought of this sooner, as it would have saved me a lot of time.

Reminders and other tips

  • Make sure that all of your crucial config files (like .xinitrc) work properly before implementing this setup! (It’s not fun fixing things in the virtual console a la CTRL+ALT+.)
  • Make sure to have per-host settings in each of your config files (or, failing that, have your makefile link intelligently to different config files on a per-host basis)
  • In my vim config files above, you’ll notice that I symlink my cfg-gui to .gvimrc. The actual cfg-gui file looks very simple, with a “source ~/.vim/cfg”, and all the gvim-specific commands following that. Make sure to have a check for any autocommands so that they are loaded only once so that gvim works properly. (I must say, I only use vim now, except when I feel like seeing 16+ million colors (GTK) as opposed to 256 (urxvt).)
  • The makefile, and its contents, can be scripted in a different programming language if you don’t want to use make. I’ve noticed that some people use ruby to do this. But it could also be python, perl, bash, or any other script.
  • Since we’re going to end up putting into syscfg most of our config files, it wouldn’t be a bad idea to add files that don’t actually need syncing (see my note on irssi above). You’d just put an if-statement in your makefile to exclude these files for certain hosts. The benefit to this approach is that, you would end up with ONE git repo for ALL of your config files. Even though I’m not at this stage yet, I feel myself inevitably being pulled toward this path. I want complete revision history for all my /etc/X11/xorg.conf, /etc/fstab, /etc/sudoers, and even /boot/grub/menu.lst files, if it’s possible to do so. It’s probably a security risk to symlink to these destinations (file permissions, which git isn’t good at, at least according to what I heard from Linus’s Google Tech Talk from 2007), but I’m the only human who has access (and cares about) the config files on my desktop/laptop anyway. I’ll update this post if I end up achieving this “one config directory to rule them all” dream.

This guide was prepared with the help of various internet websites (google is your friend), and also especially this site.

Standalone Developer and Git: How to Sync Your Local Repo Across Multiple Machines with a Remote Repository and SSH on Linux

You are a programmer, and you’ve been using git recently after you watched Linus Torvalds speak bitterly against CVS and SVN at Google’s Tech Talk last year. And, although you have git working nicely on your main box, you have multiple computers at your disposal (in your home), and you would like it if you could sync your main git repo from your main box.

Enter the remote git repository. (From now on, I’ll call my main box the remote box, and all newly to-be-synced machines local box.)

First, the requirements: apt-get install ssh. Do this on all your local boxes. To work without passwords every time you invoke ssh with git in the future, follow these steps. When it tells you to do ssh-keygen -t dsa, realize that you have to run this command on your local box (not your remote box), and NOT while logged into an SSH session. I.e., run the ssh-keygen command from your terminal. Also, you only need to follow the instructions until it says “OpenSSH to Commercial SSH”.

So now that you got SSH working smoothly across all your local boxes, it’s time to set up the git repository (“repo”) on your remote box.

  1. Either get your hands on the remote box, or use your newly-acquired SSH powers to log into it, and create two new directories, /home/<username>/git, and /home/<username>/git/myappname.git. Do it in Thunar or a terminal or wherever.
  2. Now, go into the myappname.git directory. Now type git init –bare. This will create a “bare” repo, that will eventually track all the changes. Think of it as the new “head” or “master” branch, from which all of your machines will eventually pull from. I will call this the remote repo from now on.
  3. I’ve thus far assumed that you only have 1 repository for your 1 application that you have been developing, alone, on your remote box. So before we do anything, make sure you and I are on the same page: you have the newly created bare repo from step #2, and you also have a local folder with all your source code (which served as your one and only git repo before reading this blog post). So far, so good. Now, go to this folder containing the existing git repo for your project. From here, you need to tell git that you’d like to sync this local repo with your newly created remote repo (myappname.git from step #1). The idea is to make the remote repo updated with all the changes thus far you’ve made on your local repo, with git-push, and then, from all your local boxes, do a git-pull. So from your local repo containing all your source code, type git remote add origin /home/<username>/git/myappname.git. If you decided to go the SSH route in step #1, the command would be git remote add origin ssh://username@serverURL/home/<username>/git/myappname.git. Here, origin is the nickname of the remote repo, and will be used later on with git-push and git-pull.
  4. Now, let’s actually get this remote repo up-to-date with our local repo! First, make sure that you’re on the master branch (or any other branch that’s nice and clean). We will push this branch to our remote repo. Type git-push origin master (or git push origin master if you don’t like hyphens).
  5. Almost there! Right now, your remote repo has everything you need to create a copy of the just-pushed-repo-from-step-4 on all your local boxes. Let’s do that now. From your local box type git clone ssh://username@serverURL/home/<username>/git/myappname.git. This will create a directory named myappname from your current directory, with all the source code and goodies!
  6. Lastly, let’s edit our .git/config file on ALL MACHINES (both remote and local boxes), by adding the following lines:
    [branch "master"]
        remote = origin
        merge = refs/heads/master
    
  7. That’s it! Now, from your local box, work and commit away!! When you’re done, be sure to push to the remote repo with git push origin master (or whatever branch name you are working on). For more info, type git remote show origin, and git will tell you which branches are tracked remotely. For our scenario, it would show the master branch as being tracked remotely. And, don’t forget to do git pull when you’re on a different box, to make sure you’re starting your work from the most recent changes.

So there you go. I read these links to write this post: ssh, git remote repos, default git push and pull. Now, all of your local boxes will function as essentially 1 repo, as long as you git push from 1 local box, and then immediately git pull from all the rest of your boxes. You also get the added benefit of having your source code copied across many machines — a good way of preventing data loss. You could probably even write a bash script so that, from 1 machine, you could execute git pull across all your other boxes to get updated, using SSH.

UPDATE October 27, 2008: Here is a transcript of the speech that Torvalds gave at the Tech Talk, for those of you who hate flash movies (like me).