One of the most important things you can do as a software developer is not only to use a source control system, but to use that system properly. We don’t advocate one technology over another here at Connect Think, we simply attempt to find the right tool for the job. Some of our projects use SVN, and recently we started using Git for some of our projects. This blog post will focus on Git, and how we have found success using it.
What is Git?
Git is a distributed source control system. Distributed means just that, there is no central server as there is with SVN or CVS. You have a full copy of the repository on your local machine. You can make commits, create branches, merge branches, tag commits, etc. all right on your machine without an internet connection. This is extremely powerful.
When you want to play nice with others, Git allows you to add remote repositories to your repository. There is nothing “special” about any of the remote repositories. They could be a central server (we use BeanstalkApp.com which rocks!), another developer’s computer, or a deployment target such as Heroku.com. When you are ready, you can push and pull code to and from any remote repository you have configured.
The Command Line is your Friend
I believe this to my core. If you use a GUI to do your source control operations, you will never fully understand what the utility is doing. It is important with any source control system to know exactly what it is you are doing with your code. Source control systems are extremely powerful tools, but with that power comes some level of complication.
Git’s command structure is pretty easy:
git command remote branch
First in the structure is the command. There are a lot of commands for git, but here are the basic feature functions:
- Clones a repository into a newly created directory, creates remote-tracking branches for each branch in the cloned repository (visible using git branch -r), and creates and checks out an initial branch that is forked from the cloned repository’s currently active branch.
- This command creates an empty Git repository – basically a .git directory with subdirectories for objects, refs/heads, refs/tags, and template files. An initial HEAD file that references the HEAD of the master branch is also created.
- Create, List, or Delete branches of the repository
- Updates files in the working tree to match the version in the index or the specified tree. If no paths are given, git checkout will also update HEAD to set the specified branch as the current branch.
There are more commands to come shortly.
The second part is the remote. This indicates on which of the named remotes the command should be run. If this is omitted, the command is run on the local repository.
Finally, the last command deliniates on which branch to run the command. If it is omitted, the currently checked out branch will be used.
In all source control systems, files that have been modified are tracked as modified until they are committed. SVN tracks each of these changed files individually, or as one group. You can commit all of the changed files at once, or each changed file individually.
To address this shortcoming, Git introduces the idea of a commit index. This is a collection of modifications that are ready to be committed together as a group. Essentially, files have 4 statuses in Git:
- An untracked file is one that Git doesn’t know about. It has never been committed to the repository. This is most likely a new file in your project.
- This file is tracked by Git, but it remains in the same state as it was at the time of its last commit. There is no difference between it and the file in the repository.
- Modified – Unstaged
- Modified files are just that, they have been changed since the last commit. When you make a change Git marks the file as modified, but it is not queued for commit until it is added to the index.
- Modified – Staged
- A staged file is one that has been modified and added to the git index for commit. Once the commit is complete, all files in the index are returned to the Unmodified state and the index is cleared.
This is a very powerful feature for Git. It is important to understand this workflow when working with Git commits.
No matter the source control system you are using, LEAVE GOOD COMMENTS! A good comment should include what has changed, why it changed, and any other information another developer would need at a glance. Another developer could be you in a couple of months or years, so be nice to your future self and leave good comments.
Here are examples from our own developers:
- Good Comment:
- “Fix syntax errors again for SMTP settings”
- We know what the fix was, and what part of the system the fix effected
- Short, simple, to the point
- “Fix syntax errors again for SMTP settings”
- Bad Comment:
- “minor bug”
- We have no idea what functionality the bug was discovered in, if it fixed an open issue in our bug tracker, or anything about the code change
- “minor bug”
The Distributed Part
We talked earlier about how Git is a distributed source control system. You can connect your repository to other repositories on other machines. It could be a central source control system like Beanstalk, a deployment target like Heroku, or another developer’s computer. There are a few commands that allow you to get code to and from those remote repositories:
- Updates remote refs using local refs, while sending objects necessary to complete the given refs.
- Incorporates changes from a remote repository into the current branch. A default git pull will do a merge as well.
- Can be used with –rebase command
- Destroy local changes and reset to pre-commit status
Rebase vs Merge
High among the many great features of source control systems is the ability to have multiple developers working on the same project at the same time. They can make changes to source and commit it back to the repository when they are ready. Inevitably, during those edits, two developers will edit the same file. Source control systems will attempt to merge the changes together, resulting in a file that reflects both sets of changes. Often these merges result in conflicts, which must be resolved by a developer reading the code and fixing the conflicts by hand. Git attempts to approach the merge in a smarter fashion, and introduces a new merge tool, called rebase.
When attempting to merge two separate changes to a single file, traditional source control systems would compare the the two files, detect the differences, and output a merge of the difference. Git approaches merges a bit differently, using the last known common ancestor of the two files as a third point in the merge. This three-way difference allows the merge routine to be smarter about how to merge the changes together, hopefully resulting in fewer conflicts.
A few notes on Git’s merge implementation. When you do a git merge (which is implicit with git pull), the merge occurs and a new commit is created in the repository. This is an important note, as frequent merges could clutter the log with unnecessary commit messages.
Git introduces a new tool for merging, called rebasing. When you rebase a branch with the changes from another, it takes all of the commits that have happened in each branch and plays them back one by one. Rebasing is a powerful command, and has some interesting side effects:
- Rebase rewrites the branch’s history, so all commits from both branches look linear in that branch’s log.
- Once the rebase is complete, there is no new log entry created. Rebases cannot be detected after the fact in the branch’s log.
Rebases are a great feature to reduce merge conflicts and to move commit messages inline. It has drawbacks as well, so it should be used carefully.
Branching is an important part of any source control system. When new development work needs to be done, it is often important to quarantine the changes into their own branch. When they are ready for release, the changes can be merged back into the main branch.
Git does branching in an extremely light weight, elegant way. The ideal git workflow often includes several branches. Some are short lived, such as bug fix. Some are much longer, like a release branch. There are two ways to approach branch management in your project, Feature Branching and Continuous Integration. Both have advantages and drawbacks, and often the best option depends on the project.
Regardless of which option is chosen, we recommend that you only allow production code in your main branch (master, trunk, etc). Each time you are working on a set of new functionality, create a branch of it.
Feature branching creates a lot of short lived branches in your repository. First, you create a branch off of the master for the set of changes you are about to make. We call this the version branch. When working on a specific feature, the developer would create a branch off of the version branch for that feature. Once development for that feature is complete, the developer would merge their changes back into the version branch. Any changes made on the version branch would then be merged into the other ongoing feature branches by their developers.
If the need for a hotfix arises in production, a hotfix branch would be made from the master branch. Once the fixed is deployed to product, the hotfix is merged back into the master and then pulled to all of the version and feature branches currently under development.
Once the version is completed and ready for production, the version branch is merged back into master.
Developing this way has a couple of advantages:
- It allows developers to work on a feature in isolation, only committing it back to the group when it is complete
- Master and Version branches only contain completed features and bug fixes
- Allows for feature “cherry-picking”
- If a release is being readied, but a specific feature is not ready, it can be easily left behind without effecting the rest of the release
Feature branching also has its drawbacks:
- Merging larger changesets adds a level of complication
- The effects of code changes by one developer are not felt by other developers on the project until much later in the development process
Continuous Integration aims to keep all the developers in lockstep throughout the process. Essentially, CI states that each developer should commit new code early and often, at minimum once a day. That new code is then pulled into the other developers features so the effects of changes can be felt immediately across the entire codebase.
With CI, a version branch would be created from the master. Each developer would commit to and pull from the version branch on a daily basis. When all development is complete, the version branch is merged back into master.
Just like with feature branching, if the need for a hotfix arises in production, a hotfix branch would be made from the master branch. Once the fixed is deployed to product, the hotfix is merged back into master and then pulled to all the version branch currently under development.
CI has some great advantages:
- Changes from each developer are immediately integrated with all other developers work and felt immediately
- Incompatible code is found early and often
- A “current” build with all completed functionality is always available for tests, demonstrations, or release.
Just like everything else, CI has its own drawbacks:
- Incomplete code can negatively affect the other developers on the project
- It is harder to deactivate features that won’t make the final release
Before you go downloading any of the tools listed here, reread the section about knowing what you are doing. Learn the command line. Love the command line. Then use one of these tools to supplement your work.
- Mac OSX
- For more information on Git, checkout the docs posted online: http://git-scm.com/docs
- For more on Feature Branching in Git, checkout the git workflow: http://nvie.com/posts/a-successful-git-branching-model/
- For more reading on Continuous Integration, it doesn’t get any better than Martin Fowler (even if you don’t do CI, read Martin’s stuff anyway): http://www.martinfowler.com/articles/continuousIntegration.html
The model described above is a derivative of the above link.