Introduction to Git
This post is the note on the DataCamp course led by Greg Wilson, the Co-founder of Software Carpentry.
Git is a modern version control tool created by Linus Torvalds in 2005, now it is very popular with data scientists and software developers. It can keep track of changes to files, notice conflicts between changes made by different people, and synchronize files between different computers.
A repository is the combination of two parts: the files and directories, and their historical information that Git records which are called
.git located in the root directory.
git status— check the status of your repository
git diff filename— show you the changes
git add filename— add a file to the staging area
git diff -r HEAD path/to/file— compare the state of your files with those in the staging area, the
-rflag means “compare to a particular revision”, and
HEADmeans “the most recent commit”
nano filename— use Nano to edit
git commit -m "some message in quotes"— commit the changes in the staging area with a log message,
git commit --amend -m "new message"change a commit message
git log— view a repository’s history,
git log -3 filenameshow the last three commits involving a specific file
Git uses a three-level structure for information stored by each commit.
- A commit contains metadata such as the author, the commit message, and the time the commit happened.
- A tree tracks the names and locations in the repository when that commit happened.
- A blob (short for binary large object) contains a compressed snapshot of the contents of the file when the commit happened.
Looking at the diagram
SVG (zoom for better clarity), first in the oldest (top) commit, there were two files tracked by the repository, then
draft.md were changed in the middle commit, so the blobs are shown next to that commit.
data/northern.csv didn’t change in that commit, so the tree links to the blob from the previous commit. Reusing blobs between commits help make common operations fast and minimize storage space.
A hash is a unique identifier for every commit, which enables Git to share data efficiently between repositories.
The special label
HEADis another way to identify a specific commit. It always refers to the most recent commit. The label
HEAD~1then refers to the commit before it, while
HEAD~2refers to the commit before that, and so on.
git annotate file— show who made the last change to each line of a file and when.
git diff ID1..ID2— show the changes between two commits,
..is a pair of dots.
.gitignorefile in the root directory tells Git to ignore certain files.
git clean— only works on untracked files,
git clean -nshow a list of files whose history Git is not currently tracking,
git clean - fdelete those files for good.
git config --list— see what the settings are with one of three additional options:
--system— every user on this computer
--global— every one of your projects
--local— one specific project
git config -- global setting value— change a configuration value for all of your projects on a particular computer.
git reset HEAD— unstage the additions,
git resetunstage everything.
git checkout -- filename— discard the changes that have not yet been staged,
git checkout -- .revert all files in the current directory.
git reset HEAD path/to/file git checkout -- path/to/file
git checkout, you can undo changes to a file that you staged changes to.
- You can think of committing as saving your work, and checking out as loading that saved version.
git checkout ID filenamewould replace the current version of a file with the version that
IDidentified. Notice that this is the same syntax that you used to undo the unstaged changes, except
--has been replaced by
Working with branches
Branches allow you to have multiple versions of your work and let you track each version systematically. A commit will have two parents when branches are being merged, that’s why Git needs both trees and commits.
git branch— list all of the branches in a repository
git diff branch-1..branch-2— show the difference between two branches
git checkout branch-name— switch to a branch
git checkout -b branch-name— create a branch
git merge source destination— merge one branch (source) to another (destination)
git init project-name— create a repository for a new project in the current working directory
git init /path/to/project— convert existing projects into repositories
git clone URL— clone a repository,
git clone /existing/projectuse a path,
git clone /existing/project newprojectnamecall the clone something else
git remote— list the names of its remotes,
git remote -v(“v” for “verbose”) show the remote’s URLs
git remote add remote-name URL— add more remotes
git remote rm remote-name— remove existing ones
git pull remote branch— get everything in
branchin the remote repository identified by
remoteand merges it into the current branch of your local repository,
git pullis a combination of
git push remote-name branch-name— push the changes you have made locally into a remote repository
- Step-by-step guide to contributing on GitHub
- Pro Git book
- http://git.io/sheet — a list of cool features of Git and GitHub
- Learn Git Branching — the visual and interactive way