Notes on Using Git for an Introduction to Software Engineering

Git is a popular distributed version control system used by many software projects. It keeps track of all the different versions of files and folders in a project in a way that allows multiple people to copy and share. It was originally by Linux Torvalds, famous for being the originator of Linux, who created it as the version control system for the Linux operating system source code.

Software

These notes assume you are using the following software:

  • Ubuntu Linux. You can run Ubuntu as a virtual machine (e.g. using VirtualBox) on a Windows or Mac computer. Software like SFU Vault, or DropBox is an easy way to share files with it and your other computers.

  • Git. To install git on Ubuntu Linux, use the following command in the terminal:

    $ sudo apt-get install git-core
    

    After it installs, check that it works by running this command:

    $ git --version
    git version 2.17.1
    
  • Sublime Merge. This is optional, but many people like it (or similar visual tools). Sublime Merge is a graphical Git browsing tool that gives a visual overview of your Git repositories, and helps with branching, merging, and committing.

Reference Book

We’ll be referring to the free book Pro Git in these notes. Be sure to bookmark it, and download a copy of the PDF.

Git Basics

Please read Chapter 1.1 of Pro Git to get a good overview of version control software and what it does.

Essentially, you can think of a Git repository as a super-charged file system. A regular file system stores only the current files and folders, while Git also saves all previously committed files and folders. You can (relatively!) easily check out previous version of the entire file system (using branches or tags), or previous, updating, and processing all the versions of your work.

In additions to file objects and folder objects that appear in a regular file system, a Git repository also stores the following kinds of objects:

  • commits
  • branches
  • tags

Every Git repository stores these objects (and files and folders) in a local database that is in a hidden folder, e.f. ./git. The objects are, at least in the original version of Git, indexed by their SHA-1 hash code. SHA-1 is a cryptographic hash function that, for all practical purposes, assigns a unique hash code to every object that a repository stores. When you are searching or reviewing a Git repository, you will often come across these codes. In 2015, it was discovered by security researchers that it is possible (although quite expensive) to create two different files with the same SHA-1 code. Since Git assumes all hashes are unique, putting two such files into a Git repository corrupts it. Thus, more recent versions of Git allow for more secure hash functions (such as SHA-256) in case you are concerned about this possibility.

Git is a distributed version control system (as opposed to a centralized version control system, such as Subversion). That means every developer keeps their own personal copy of the the entire repository that they can modify in any way they wish. When they are ready to share their code with the rest of the team, they “push” their code to a shared central Git repository.

Typically, a team will have one main shared repository on a website (such as GitHub). Any team member can copy from this main shared version of the repository, or upload new changes to it. Changes are usually uploaded as branches that are then reviewed by the rest of the team before being merged into the main repository branch.

Files: Committed, Modified, and Staged

Please read Chapter 1.3.

Files in a Git repository can be in one of three states: committed, modified, and staged. It’s important to understand these states because many Git commands refer to them.

A committed file is one that is stored in the local Git database as a Git object.

A modified file is one that is changed in some way, but has not yet been stored as an object in the local Git database.

A staged file is a modified file that has been marked for being put into the local Git database the next time files are added.

The general idea is that you check out some files from the local Git database, and then do some work on them, e.g. edit them, rename them, add new files, etc. All these edited files are considered modified. When you are ready, you tell Git which modified files you want to put into the local database as a new commit. The files that you tell Git to prepare for being committed are said to be staged files. Usually you want to stage all modified files, and so Git makes it easy to do that. But, if you only want to stage some files, then Git lets you pick and choose which files to add.

Using Sublime Merge, you can see which files are unstaged, and also how they differ from the currently committed version of the file. This can be very useful for understanding exactly what the files differences are.

Commits

Adding staged files to the local Git database is called making a commit. Git saves all commits, and so you can go back to and review any previous commit.

One important part of a commit is the commit message. This is a message that explains the commit for developers, and is typically a short and straightforward description. Writing good commit messages takes practice: they should not be too short, or too long, or too vague, or too specific.

Sublime Merge is a visual tool that lets you see the details of your repository commits. This can be especially useful for big/complex repositories with many branches.

You can also see commit information using the git log command in the terminal, e.g.:

$ git log
commit ca82a6dff817ec66f44342007202690a93763949
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Mon Mar 17 21:52:11 2008 -0700

    changed the version number

commit 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sat Mar 15 16:40:33 2008 -0700

    removed unnecessary test

commit a11bef06a3f659402fe7563abf99ad00de2209e6
Author: Scott Chacon <schacon@gee-mail.com>
Date:   Sat Mar 15 10:31:28 2008 -0700

    first commit

The log command has a huge number of options, some of which you can read about in Chapter 2.3 of Pro Git

Notice that each commit has some standard information:

  • A unique commit id, e.g. ca82a6dff817ec66f44342007202690a93763949. This is the unique hash code that Git uses to identify the commit.
  • The author of the changes. Note that this may, or may not, be the same person who commits the change. For example, in multi-person projects, one person might fix a bug, and another person might review the fix and commit it to the main repository.
  • The date and time of the commit.

Git also provides numerous commands for searching through commits, which can be useful in large projects.

Branching and Merging

Branching and merging are one of the key features of Git. At any time, you can make a branch of your repository. A branch is, conceptually, a fresh copy of the entire repository, e.g. a snapshot of the all of the files. For efficiency, files that have not changed are not copied, and instead a reference to the file is stored. For most developers, these underlying details don’t matter, and each branch acts like a complete independent copy of all other branches.

One important Git branch you should know about is master. When you create a new empty repository, Git names the default branch master. Typically, the main versions of your files and folders are stored in this master branch, and other branches are used for testing out new features, fixing bugs, etc.

Some of the things branches can be used for are:

  • Fixing a bug. For example, you could make a new branch off of master called “bugfix”. In the bugfix branch, you can safely test any changes without altering the master branch.
  • Adding a feature. For every new feature that you add to your system, it is typical in Git to create a brand new branch for it. This lets you safely implement the feature without causing problems in the master branch.
  • Preparing a release. When it comes time to release the next version of your software, you could do this by making a special “release branch”.
  • Giving a demo. Sometimes you might be asked to give a demo of your software, maybe at a meeting or for a conference. You could create a new demo branch that lets you show what you want to show without messing up the main branch.

Essentially, any time you want to make a change to code in the project, you should consider making a branch so that the change is isolated from the rest of the project.

Merging refers to combining two, or more, branches. For example, after you make and test a bug in a branch, you can merge your changes back into the master branch. Git is very good at merging, and won’t allow the merge if there are any inconsistencies or problems.

Branching and merging work so well in Git that they are used extremely frequently by many developers. Typically, every bug-fix or feature is created in its own branch.

Branching in More Detail

Please read Chapter 3.1 (Branching in a Nutshell).

Branches in Git are lightweight. That means creating a branch is a relatively inexpensive operation, even in repositories with thousands of files.

Essentially, a Git branch is a pointer to a commit object. A commit object is itself a pointer to a tree object (which stores the folder structure of the repository) and back-pointers to the immediate previous commits. Also, there is some meta-data like time of commit and the author. Creating a new branch in Git does not cause any files to be copied.

Note that you can use the back-pointers on the commit objects to trace through the history of a branch.

Git uses the special pointer named HEAD to point to the branch you are currently working on. Figures 12 and 13 in Chapter 3.1 (Branching in a Nutshell) <https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell> make this clear, and you should study those carefully.

Switching to a new branch is done by making the HEAD pointer refer to a new branch object. You do this using the git checkout command. Figures 15 and 16 in Chapter 3.1 (Branching in a Nutshell) <https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell> make this clear.

An Example of Branching and Merging

Please read Chapter 3.2 (Git Branching, Basic Branching and Merging). This section works through a practical example of how you might use Git in a practical project.

It shows good examples of branching and merging. It gives examples of both a fast-forward merge (where all that is done is to make the HEAD pointer point to the most recent commit), and a three-way merge (that merges two commits and their common ancestor). It also shows what happens when there is a merge conflict. It also points out that when you are finished with a branch you can delete it.

This is very practical stuff, and it is worth the time to read through this section carefully. Try to understand all the different kinds of Git objects and how they relate. Having a clear mental model of how Git works is very useful when repositories become more complex, or if you run into problems.

Tagging

Please read Chapter 2. (Tagging).

Git lets you tag commits in your repository as being important. For example, you could use a tag to mark version of your software, e.g. maybe “v1” for version 1, “v1.1” for an update to version 1, etc.

There are two kinds of tags in Git: lightweight tags and annotated tags. You will usually want annotated tags since they store more useful information. Essentially, an annotated tag is a pointer to a specific commit, plus other information like an annotation message, the name of the person who made the tag, the date it was made, and so on.

You can create an annotated tag in the terminal like this:

$ git tag -a v1.4 -m "my version 1.4"

This creates a tag named v1.4, which stores the message “my version 1.4”.

Please see Chapter 2. (Tagging) for examples of how to do things like list all tags, show info about (an annotated) tag, and delete a tag.

Note that deleting a tag does not delete any of the files associated with the tag. It just deletes the tag object.

Just as with branches, you can use the checkout command to check out the version of your repository associated with a particular tag, e.g.:

$ git checkout v1.1
...

In practice, one good way to use tags is to mark important versions and releases of your software.