Conversational Git

Objectives

  • Participate in collaborative development by copying Git repositories using the git clone command.
  • View historical changes in their Git repositories using git log.
  • Navigate the history of their Git repositories using git reflog and git checkout.
  • Restore saved versions of files using git checkout.

When you travel to a country where you don’t speak the native language, frequently there is no time, to properly learn it. If you are to accomplish anything on your own, you need to know a few essential words and phrases. The same is true for Git. This lesson won’t teach you to become a Git expert. Instead, we want you to be able to understand some of the vocabulary, and communicate: your desires to Git, and your actions to others. Along the way, we’ll introduce a few Git concepts (nouns), commands (verbs), and command line arguments (adverbs). In most cases, the English meaning of the word will help you recall its meaning to Git. Please keep in mind, though, that Git uses some of its verbs and nouns very differently from other revision control systems.

Copying Repositories (git clone)

The first concept we introduce is the repository. The repository contains a directory of files and folders and their revisions going back to its creation.

We will start with a small repository with only a few commits that has been created for you to practice with.

We will copy the bio-pipeline repository from GitHub user ahmadia using our second Git command, the clone command.

When we execute git clone, the command makes a perfect copy of another repository. By default, it creates a Git repository of the same name in the same directory you entered the command. The command (and its successful output) should look similar to this:

$ git clone https://github.com/ahmadia/bio-pipeline.git
Cloning into 'bio-pipeline'...
remote: Counting objects: 41, done.
remote: Compressing objects: 100% (36/36), done.
remote: Total 41 (delta 19), reused 23 (delta 4)
Unpacking objects: 100% (41/41), done.
Checking connectivity... done

You can now enter the repository (which is also a directory on your file system) by typing:

$ cd bio-pipeline

If we now type ls, we see that the repository has some code and a few data files.

$ ls
2013-05-24-2760-2763.txt Lumi.2761.csv            Lumi.2763.csv
Lumi.2760.csv            Lumi.2762.csv            python_pipeline.ipy

If we add the -a flag to show everything, we can see that Git has created a hidden directory called .git:

$ ls -a
.                        2013-05-24-2760-2763.txt Lumi.2762.csv
..                       Lumi.2760.csv            Lumi.2763.csv
.git                     Lumi.2761.csv            python_pipeline.ipy

Git stores information about the project in this special sub-directory. If we ever delete it, we will lose our local copy of the project’s history, and any changes or commits we had not published yet.

Viewing History (git log)

We are looking at the latest revision, also referred to in the Git documentation as a commit, of the bio-pipeline repository. If we want to see the name of this revision, we use the log command. By default, when we execute git log, it gives us information about this revision and every other revision made before it. We use the the command-line argument, --max count 1, to inform Git that we only want to see the current one.

$ git log --max-count 1
commit 61fd2bcece2126cdd8ee24f40a04c18d39403022
Author: Aron Ahmadia <aron@ahmadia.net>
Date:   Tue Jun 4 10:59:21 2013 -0400

    Made fixes to Python pipeline

Our fingers are starting to get sore from all of this typing. Luckily, -n is a common shortcut for number of things in programming and at the command line. To save a few keystrokes, we will instead type:

$ git log -n 1

which is equivalent to the previous command.

The output of the command provides a summary of the revision Aron Ahmadia committed to the repository on June 4, 2013. The line: Made fixes to Python pipeline is Aron’s commit message.

The alphabet soup of digits and letters starting with 61fd2 is called a hash. The hash uniquely identifies this revision, and was automatically generated by Git as the final step of the commit process. We can think of the hash as an identifier permanently affixed to this exact version of the code and data.

Revision Ancestry

Each revision’s parent is the previous version of the code and data, and immediately precedes it in history. We can see each revision’s parents as output from git log by adding the --parents flag.

$❯ git log --parents -n 1
commit 61fd2bcece2126cdd8ee24f40a04c18d39403022 8595b710e3be4b2bf01d51a1c55842510b82ff87
Author: Aron Ahmadia <aron@ahmadia.net>
Date:   Tue Jun 4 10:59:21 2013 -0400

    Made fixes to Python pipeline

Notice that the parent revision is referred to only by its hash. Since the hash uniquely identifies this revision, this is the only information we need to look up the state of the repository when the parent revision was created.

Usually, we are interested in how a revision differs from its parent. We can also see this output by adding the -p flag (or --patch), to the git log command.

After you enter this command, you will see the changes in this revision presented in diff format. Since the output does not fit in the screen, Git will pipe the output into a pager (by default, less). You can scroll up and down through the log by using the up and down arrow keys. When you are done, just press q.

$ git log -n 1 -p
commit 61fd2bcece2126cdd8ee24f40a04c18d39403022
Author: Aron Ahmadia <aron@ahmadia.net>
Date:   Tue Jun 4 10:59:21 2013 -0400

    Made fixes to Python pipeline

diff --git a/python_pipeline.ipy b/python_pipeline.ipy
new file mode 100644
index 0000000..ab9e62b
--- /dev/null
+++ b/python_pipeline.ipy
@@ -0,0 +1,44 @@
+%pylab
+import numpy as np
+
+f = open('Lumi.2760.csv')
+g = f.readlines()
+f.close()
...

The output is slightly cryptic because it is intended to be read by machines in addition to humans. The differences, also known as the diff, tells you how each file was changed from its previous version to this one.

In general, lines starting with a single ‘+’ were added, and lines starting with a single ‘-’ were removed.”. Lines without the initial ‘+’ or ‘-’ are present in both versions, and are provided as helpful context so you can understand the changes.

The diff headers in the output:

diff --git a/python_pipeline.ipy b/python_pipeline.ipy
new file mode 100644
index 0000000..ab9e62b
--- /dev/null
+++ b/python_pipeline.ipy

summarize the differences between the previous version of the file and its new version.

Since python_pipeline.ipy was a file new to the repository, we see this special line:

--- /dev/null

This indicates that there was no previous file, and this file is new. The following line:

+++ b/python_pipeline.ipy

Tells you that the new file was named python_pipeline.ipy.

The numbers between the @@ markers informs you which lines were changed,

@@ -0,0 +1,44 @@

In this case, Aron created a new file and added lines 1-44.

The lines following, preceded by a +, are the contents of the new file he added, python_pipeline.ipy.

Here are two more useful arguments to git log:

  • --oneline - Prints only the first few characters of the hash and the first line of the commit message in each revision.
  • --stat - Prints out a summary of files changed in each revision.

If we use them together, we see a nice text summary of how the repository has changed since it was created.

$ git log --stat --oneline
61fd2bc Made fixes to Python pipeline
 python_pipeline.ipy | 44 ++++++++++++++++++++++++++++++++++++++++++++
 python_pipeline.py  | 49 -------------------------------------------------
 2 files changed, 44 insertions(+), 49 deletions(-)
8595b71 first pass at making a pipeline
 python_pipeline.py | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)
d7c2a9d Merge branch 'add_2763'
a396b40 added Lumi 2763
 Lumi.2763.csv | 10 ++++++++++
 1 file changed, 10 insertions(+)
ef023fe added Lumi 2762
 Lumi.2762.csv | 10 ++++++++++
 1 file changed, 10 insertions(+)
779f888 Added Lumi 2761
 Lumi.2761.csv | 10 ++++++++++
 1 file changed, 10 insertions(+)
cbd6ff5 Added data file
 2013-05-24-2760-2763.txt | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 Lumi.2760.csv            | 10 ++++++++++
 2 files changed, 60 insertions(+)

Time travel (git checkout)

Git can’t really travel through time, but it does allow us to inspect its repositories as they looked in the past. Imagine that Git automatically prints out all of your code, prose, and data, (the contents of your repository), and binds them into a complete book, any time you want it to. Imagine also that instead of a friendly librarian, you have to ask Git to retrieve revisions of your book for you. Git happily does this when you tell it to checkout your revisions.

Let’s see what the repository looked like when it was first created, by giving git checkout the first four digits of the oldest commit in our history:

$ git checkout cbd6
Note: checking out 'cbd6'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using `-b` with the checkout command again. Example:

  git checkout -b new_branch_name

HEAD is now at cbd6ff5... Added data file
$ ls
2013-05-24-2760-2763.txt Lumi.2760.csv

We’ll explain the detached HEAD message in the next section. For now, note that the contents of the directory have changed.

Let’s restore the original revision by finding the right hash in git log.

$ git log --oneline
cbd6ff5 Added data file

Uh-oh. git log, by default, only tells us the history of our current revision.

Don’t worry, we only need to add the --all flag, to see all of a repository’s available history:

$ git log --oneline --all
61fd2bc Made fixes to Python pipeline
8595b71 first pass at making a pipeline
d7c2a9d Merge branch 'add_2763'
a396b40 added Lumi 2763
ef023fe added Lumi 2762
779f888 Added Lumi 2761
cbd6ff5 Added data file

This is enough to go back to where we were, but let’s use this as an opportunity to introduce another Git feature.

Reflog

We’d like to go back to (or, check out) the most recent revision. We could use the output of git log --all, but there is a better Git command for navigating project history, the reference log, or reflog.

Git keeps a reference log for you that includes the revisions you have checked out. The currently checked out revision is referred to as, for no particularly great reason, the HEAD. Every time you use git checkout, HEAD moves to the new commit, and the reflog gets another entry.

We can use the git reflog command to access this history and see where we are, and where we’ve been in the history of our repository.

$ git reflog
cbd6ff5 HEAD@{0}: checkout: moving from master to cbd6
61fd2bc HEAD@{1}: clone: from https://github.com/ahmadia/bio-pipeline.git

By default, git reflog outputs one line of text for each time HEAD has moved. The last move was caused by our checkout command, and moved us to the revision identified by cbd6ff5. We are interested in the first column of output, which tells us which revision we were on before we called git checkout.

Since git reflog reports our actions going backwards in time, the first row contains our current revision, and the second row is one checkout back, where we started.

Lets we go back to the revision we started at.

$ git checkout 61fd
Previous HEAD position was cbd6ff5... Added data file
HEAD is now at 61fd2bc... Made fixes to Python pipeline

Checkpoint 1

  • [1A] Explain the the two lines of output from git checkout to your neighbor.
  • [1B] Predict the output of git reflog if you call it now. Try it.
  • [1C] Try calling the command git checkout -. Can you explain what it does to your neighbor? (You may need to call it multiple times and inspect the reflog each time).

Checkpoint 2

At some point in the project’s history, Aron replaced the file python_pipeline.py with python_pipeline.ipy.

  • [2A] See if you can identify the commit where Aron added python_pipeline.py
  • [2B] Check out that commit and view python_pipeline.py in an editor.

Undoing Mistakes (git checkout)

git checkout is Git’s Swiss Army Knife. It does slightly different things, depending on how it’s called.

We just showed you how to restore the entire directory to a previous state, but git checkout also allows us to just restore a specific file.

Let’s practice by doing something dangerous. First, let’s make sure you’re on the most recent revision.

$ git checkout 61fd

Then, go ahead and remove Lumi.2763.csv.

$ rm Lumi.2763.csv
$ ls Lumi.2763.csv                                                                            ✖
ls: Lumi.2763.csv: No such file or directory

There are a number of ways to accidentally corrupt, modify, overwrite, or destroy files. Here, we use the rm command, to simulate a catastrophic deletion of our valuable data.

Fortunately, since our copy of Lumi.2763.csv was committed to the repository, it is as easy as pie to restore it.

$ git checkout Lumi.2763.csv
$ ls Lumi.2763.csv
Lumi.2763.csv

In fact, so long as an undamaged copy of our Git repository exists somewhere, we will always be able to recover lost or damaged files.

Checkpoint 3

  • [3A] Modify a file, save the changes, then use git checkout to recover the version stored in history.
  • [3B] Explain to your neighbor which version of foo.txt is recovered when the user types git checkout foo.txt in a Git repository.