Introduction to Version Control using Git#

Overview

Questions:

How do I use git to keep a record of my project?

Objectives:

Explain the purpose of version control.
Introduce common git commands.
Understand how to create a commit.
Understand how to view diffs and see previous versions of files.

Follow Along with This Lesson

To follow along with this lesson, you can complete the previous lessons, or you can download a pre-made workshop repository that is at the starting point.

You will need to make sure that you have git installed and configured, as described in the set-up instructions.

SHELL

git clone https://github.com/MolSSI-Education/molecool.git
cd molecool
git checkout git-start
git switch -c main

You can also download the pre-made workshop repository as a zip file. If downloading as a zip file, you will need to initialize git in the repository and make an initial commit in order to use git.

What is version control?#

Version control keeps a complete history of your work on a given project. It facilitates collaboration on projects where everyone can work freely on a part of the project without overwriting others’ changes. You can move between past versions and roll back when needed. Also, you can review the history of your project through commit messages that describe changes in the source code, and see what exactly has been modified in any given commit. You can see who made the changes and when it happened.

This is greatly beneficial whether you are working independently or within a team.

git vs. GitHub

git is the software used for version control, while GitHub is a hosting service. You can use git locally (without using an online hosting service), or you can use it with other hosting services such as GitLab or BitBucket.

Other examples of version control software include Subversion (svn) and Mercurial (hg).

Making Commits#

You should have git installed and configured from the setup instructions.

In this section, we are going to edit files in the Python package that we created earlier and use git to track those changes.

First, use a terminal to cd into the top directory of the local repository (the outer molecool directory).

In order for git to keep track of your project, or any changes in your project, you must first tell it that you want it to do this. You must manually create checkpoints in your project if you wish to have points to return to. If you were not using the CookieCutter, you would first have to initialize your project (i.e. tell git that you were working on a project) using the command git init.

When we ran the CMS CookieCutter, it actually initialized git for us, added our files, and made a commit (how convenient!). We can see this by typing the following into the terminal on Linux or Mac

SHELL

ls -la

Here, the -la says that we want to list the files in long format (-l), and show hidden files (-a).

You should see .git in the output. .git is a directory where git stores the repository data. This is one way to see that we are in a git repository.

Next, type

SHELL

git status

OUTPUT

On branch main
nothing to commit, working tree clean

This tells us that we are on the main branch (more about branching later) and that no files have been changed since the last commit.

Next, type

SHELL

git log

You will get an output resembling the following. This is something called your git commit log. Whenever you make a version, or checkpoint, of your project, you will be able to see information about that checkpoint using the git log command. The CookieCutter has already made a commit and written a message for you, and that is what we look for in this first commit in the log.

OUTPUT

commit 25ab1f1a066f68e433a17454c66531e5a86c112d (HEAD -> main, tag: 0.0.0)
Author: Your Name <your_email@something.com>
Date:   Mon Feb 4 10:45:26 2019 -0500

    Initial commit after CMS Cookiecutter creation, version X.X

Your version number for the Cookiecutter will depend on when you ran the Cookiecutter and the current released version.

Each line of this log tells you something important about the commit, or checkpoint, that exists for the project. In the first line,

commit 25ab1f1a066f68e433a17454c66531e5a86c112d (HEAD -> main, tag: 0.0.0)

You have a unique identifier for the commit (25ab1…). You can use this hexadecimal number to reference this checkpoint.

Then, git records the name of the author who made the change.

Author: Your Name <your_email@something.com>

This should be your information. This way, anyone who downloads this project can see who made each commit. Note that this name and email address match what you specified when you configured git in the setup, with the name and email address you specified to cookiecutter having no effect.

Date:   Mon Feb 4 10:45:26 2019 -0500

Next, it lists the date and time the commit was made.

    Initial commit after CMS Cookiecutter creation, version 1.0

Finally, there will be a blank line followed by a commit message. The commit message is a message that whoever made the commit chose to write but should describe the change that took place when the commit was made. This commit message was written by cookiecutter for you.

When we have more commits (or versions) of our code, git log will show a history of these commits, and they will all have the same format discussed above. Right now, we have only one commit: the one created by the CMS CookieCutter.

Viewing previous versions#

If you need to check out a previous version,

SHELL

git checkout COMMIT_ID

This will temporarily revert the repository to whatever the state was at the specified commit ID.

Let’s check out the version before the most recent edit we made to the README.

SHELL

git log --oneline

OUTPUT

d857c74 (HEAD -> main) add information about dependencies to readme
3c0e1c6 update readme to have instructions for developmental install
116f0cf (tag: 0.0.0) Initial commit after CMS Cookiecutter creation, version 1.1

In this log, the commit ID is the first number on the left.

To revert to the version of the repository where we first edited the readme, use the git checkout command with the appropriate commit ID.

SHELL

git checkout 3c0e1c6

If you now view your README.md, it has reverted to the previous version of the file.

To return to the most recent point,

SHELL

git checkout main

Exercise - Creating a Repository#

Exercise

What list of commands would mimic what the CMS CookieCutter did when it created the repository and made the first commit? (Hint - to initialize a repository, you use the command git init.)

Solution

To recreate the CMS CookieCutter’s first commit,

SHELL

git init
git add .
git commit -m "Initial commit after CMS Cookiecutter creation, version 1.0"

The first line initializes the git repository. The second line adds all modified files to the current working directory, and the third line commits these files and writes the commit message.

Exploring git history#

When working on a project, it is easy to forget exactly what changes we have made to a file. To check this, do

SHELL

git diff HEAD README.md

We should get a blank result. “HEAD” is referencing the most recent commit. Since we committed our changes to README.md, there is no difference to show.

Open your README.md and add the following line to the end of it.

This line doesn't add any value

Save that file and run the same command.

SHELL

git diff HEAD README.md

OUTPUT

diff --git a/README.md b/README.md
index 94e0b50..a68f349 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,8 @@ This package requires the following:
   - numpy
   - matplotlib

+This line doesn't add any value.
+
 ### Copyright

To compare against the commit just before the most recent commit, add “~1” to the end of “HEAD”:

SHELL

git diff HEAD~1 README.md

OUTPUT

diff --git a/README.md b/README.md
index e778cd4..94e0b50 100644
--- a/README.md
+++ b/README.md
@@ -13,6 +13,10 @@ This repository is currently under development. To do installation in development mode, download this repository and type

`pip install -e .`

in the repository directory.

+This package requires the following:
+  - numpy
+  - matplotlib
+
 ### Copyright

If we want to compare against a specific commit, we can first do git log to find the commit’s ID, and then do:

SHELL

git diff *commit_id* README.md

Another problem that we sometimes encounter is wanting to undo all of our changes to a particular file. This can be done with

SHELL

git checkout HEAD README.md

If you open README.md you will see that it has reverted to the content from the most recent commit.

Of course, you could also replace HEAD here with HEAD~1 or a specific commit ID.

Adding data#

We now have a package with some functions. Let’s add the data from our starting material to our package as well. We will add this to the molecool/testing/data directory. Although it would be a best practice to add these files through a branch, we will add them directly to the main branch for simplicity.

Assuming that you ran the cookiecutter from the starting material directory,

SHELL

cp -r ../data molecool/tests/

Then, commit the change:

SHELL

git add .
git commit -m "add data to package"

Ignoring files - .gitignore#

Sometimes while you work on a project, you may end up creating some temporary files. For example, if your text editor is Emacs, you may end up with lots of files called <filename>~. By default, Git tracks all files, including these. This tends to be annoying since it means that any time you do git status, all of these unimportant files show up.

We are now going to find out how to tell Git to ignore these files so that it doesn’t keep telling us about them every time we do git status. Even if you aren’t working with Emacs, someone else working on your project might. So let’s do the courtesy of telling Git not to track these temporary files. First, let’s ensure that we have a few dummy files. Make empty files called testing.txt~ and README.md~ in your repository using your text editor of choice.

While we’re at it, also make some other files that aren’t important to the project. Make a file called calculation.out in molecool/data using your text editor of choice.

Now check what Git says about these files:

SHELL

git status

OUTPUT

On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	README.md~
	molecool/data/calculation.in
	molecool/data/calculation.out
	testing.txt~

nothing added to commit but untracked files present (use "git add" to track)

Now we will make Git stop telling us about these files.

Earlier, when we looked at the hidden files, you may have noticed a file called .gitignore. Cookiecutter created this for us, however, GitHub also has built-in .gitignore files you can add when creating an empty repository.

This file is to tell git which types of files we would like to ignore (thus the name .gitignore).

Look at the contents of .gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
#  Usually these files are written by a Python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

...

Git looks at .gitignore and ignores any files or directories that match one of the lines. Add the following to the end of .gitignore:

# emacs
*~

# temporary data files
*.in
*.out

Now do “git status” again. Notice that the files we added are no longer recognized by Git.

SHELL

git status

OUTPUT

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   .gitignore

no changes added to commit (use "git add" and/or "git commit -a")

We want these additions to .gitignore to become a permanent part of the repository:

SHELL

git add .gitignore
git commit -m "Ignores Emacs temporary files and input/output files from calculations."

One nice feature of .gitignore is that prevents us from accidentally adding a file that shouldn’t be part of the repository. For example:

SHELL

git add data/calculation.in

OUTPUT

The following paths are ignored by one of your .gitignore files:
data/calculation.in
Use -f if you really want to add them.

It is possible to override this with the -f option for git add.

Final Repository State

You can see the final state of the repository after this section here.

You can also download a zip of the repository here.

Key Points#

Key Points

Git provides a way to track changes in your project.
Git is a software for version control and is separate from GitHub.

Introduction to Version Control using Git#

What is version control?#

Making Commits#

The 3 steps of a commit#

git add, git status, git commit#

Viewing previous versions#

Exercise - Creating a Repository#

Exploring git history#

Creating new features - using branches#

Exercise - Using Branches#

Adding data#

Ignoring files - .gitignore#

More Tutorials#

Basic git#

Key Points#