Questions you were too embarrassed to ask about Git

2/25/2020 03:12:00 PM

I think one of the challenges with helping people learn git is the people who are experienced enough to write a tutorial have long since forgotten what challenges they struggled with when they first started learning. This post tries to fill that gap by answering questions I've seen people who are new to git struggle with. It is particularly geared towards writers and researchers who want to start using version control for their projects, something I highly recommend. This post isn't a tutorial, but rather an FAQ to supplement them.

Table of Contents

Which version control to choose?

I frequently see folks worrying about how to evaluate various options for version control. That's understandable but I think unwarranted: there really is one obvious choice.

Git has between 80 and 90 percent of market share. Most the rest are using Subversion. There are advantages and disadvantages to each: subversion is centralized, git is decentralized; subversion is easier to use but requires significant infrastructure, git is less intuitive but requires no infrastructure. But I'll make this simple: everything works with Git, only some things work with Subversion.

It is pointless to resist—just use git.

Is Git the same thing as GitHub?

No. Git is the thing that runs on your computer, GitHub is one of several websites where you can push your git repository to, if you want to make it easy for the public to view and copy your work. GitHub uses Git, but it did not make Git.

Do I have to use Git with a service like GitHub?

No. When you push a git repository to GitHub, GitHub's copy of the repository is called a "remote repository," but you can have other remote repositories in addition to or instead of GitHub. These can be on any computer or storage device, even something as dumb as a USB thumb drive.

Normally when you create a new git repository you use the command git init. If you want to create a remote repository on a local or network drive, it's almost the same command: git init --bare. You then connect this remote repository to your local repository by navigating to your local repository and running the command

git remote add origin C:/path/to/remote/repository/folder

In this command, origin is what you want it to name the remote repository. It can be anything, but origin is the default. Replace C:/path/to/remote/repository/folder with the actual path to the remote you created with git init --bare, but note the forward slashes—bash, unlike Windows, doesn't like backslashes.

To check that it successfully added the remote, run git remote -v and it will print all of the remote repositories. You can then push and pull from the remote exactly as you would with GitHub.

Like I said: Git requires zero infrastructure.

I tried 'git log' and now I can't get out

If there are enough entries in the git log, not all the entries will fit in the command prompt console, and so it uses a special program to allow you to scroll through the log text. Yes, it's super dumb given that the console itself also scrolls. But old-timey programmers like it. Just press q to leave the log and return to the normal command line interface.

Why don't ctrl+c and ctrl+v work in bash?

Pasting into a command prompt is pretty dangerous! If you copied from that website, the website could have placed anything they wanted on your clipboard, and if they entered a return character it would have executed in the command line before you had a chance to see it.

That said, the reason keyboard shortcuts don't work because bash has its own shortcuts and can't always resolve the ambiguity. ctrl+C, for example, is the bash shortcut to kill a currently active process. You can copy and paste by right-clicking in the console.

What is "detached head," how did I get there, and how do I get out of it?

Detached head means the commit that you have currently checked out is not the latest one on that branch. The way to get out is to checkout a branch, such as git checkout master. If you made changes while it was in detached head state, you will need to commit or stash those changes.

You could also checkout a new branch from the detached head state using git checkout -b myBranch, which would put you back at HEAD, but on a feature branch that matches whatever old commit you had checked out. You can then merge all or some of these old versions of files back into another branch.

What is a successful branching model?

I'm not entirely sure if this reply to my request was sincere or joking

After all, it is the topic of what is perhaps the most famous blog post in all of gitdom. This post is geared at academic researchers rather than software developers, so I probably would not recommend the git-flow branching model.

Instead I'd recommend having two types of branches:

  • master branch - there's only one of these and it's permanent
  • feature branches - you'll have many, and they're temporary

master branch

Think of the latest commit on master as the current state of your project. When you submit your paper to a journal, the version you submit will be whatever's sitting at the latest commit on master.

However, you should never make any change at all to any document on the master branch. That's what feature branches are for.

Feature branches

Instead of making changes directly on the master branch you create a feature branch from master with the command

git checkout -b myBranch

but replace myBranch with a name that describes what changes you plan to make. Make as many commits as you want on the feature branch. When you're happy with the changes and get to a state where the code runs and the paper compiles, merge it back into the master branch

git checkout master
git merge --no-ff myBranch

You'll be prompted to add a commit message. Save the message and close the editor and it will finish the merge. When you're satisified it merged correctly, delete the feature branch

git branch -d myBranch

It is customary at this point to tag your commit in master:

git tag version-1.0.0

The idea with this branching pattern is that feature branches are where you do your work, and the master branch always represents a stable version of the project where the code runs without errors, the paper compiles, everything is documented in a reasonable way.

What if a change is made to master after I created a feature branch?

It's easy to think of git commits as being snapshots of your files at a particular state in time. But this isn't quite true—each commit stores the changes made to your files in between commits. That means if a change has been made to the master branch you can merge the changes from master into your feature branch.

git checkout myBranch
git merge --no-ff master

and then when you're ready you can still merge the feature branch back into master.

This approach works well for text files, like R scripts, latex documents, and stata do files. It works less well for things like Word documents and Excel files because their internal structures are too complex for Git to be able to synchronize the latest changes in each file. Instead, merging an Excel file from one branch into another will overwrite the entire Excel file with which ever version was newer. This isn't bad for most research projects, but it means git is not the right tool for crowd-sourcing an Excel file.


When you merge two branches, git combines the newest changes to each file in each branch. But sometimes you've made different changes to the same line in the same file in both branches, so git doesn't know which one to use. In these cases, git throws a merge conflict.

Git will tell you which files had conflicts. Open those files and you'll see code like

<<<<<<< HEAD
text from current branch
text from incoming branch
>>>>>>> myBranch

Replace this entire block with whichever version you want to keep and save the file. When you've done this for all the conflicts run

git add .
git commit -m "Merge myBranch into master"

but of course replace the commit message with something appropriate. Once the commit happens the merge conflict will disappear.

Visual Studio and other tools have pretty good tooling for handling merge conflicts, I recommend them if you have access to them.

What's the difference between bash, shell, console, terminal, and command prompt? What do these mean?

The simplified "lie-to-children" answer is that these are all terms for a command prompt. The longer more correct answers are

  • terminal - the command line program where you type input text and it does things and prints output text
  • console - the physical thing that a terminal runs in. Console is also the term for a kind of furniture, that's where it comes from
  • shell - a user interface that runs other programs inside it. In the Linux community, "shell" is commonly used as a metonym for the command-line interface.
  • bash - what Linux and Unix call their main kind of command-line shell. You can also install it on Windows, which git does by default
  • command prompt - what Windows calls their most commonly used command-line shell

The relevance of all this to Git is that most advice recommends that you use the command line interface, and that you do so in bash specifically. This makes a certain amount of sense because Git was invented by the same guy who invented Linux. And while the command line is easy for lots of git operations, I would suggest that you find some good graphical tools for things like comparing differences between commits or resolving merge conflicts.

I solved a merge conflict but the command prompt still says "merging." Now what?

A merge is basically just a special kind of commit. You just need to commit in order to complete the merge:

git add .
git commit -m "Merge X into Y"

I really screwed up my git repository. Is there a way to blow up the repository but keep the current files?

Yes. Make sure your operating system is set to show hidden files and folders—the git repository exists in a hidden folder named .git. If you delete it, the git repository will be gone, but your locally checked out files will remain. Remember that this is not reversible.