Git

The impact Git has had on our world is significant. This tool has enabled a level of collaboration that was not possible before and has changed how we work. There are several key reasons:

  1. Git is distributed. Any repository can pull and merge from any other repository. Thus, changes can flow in any route. This allows communities to self-organize as needed.
  2. Git is typically used in a pull (vs push) model where upstream maintainers pull changes from contributors, rather than a select group having write access to the repo who push changes. This allows anyone to easily make changes and propose they be included in the upstream versions, yet give maintainers control of what goes into the repo they maintain. The pull flow also encourages code review and collaboration as each pull request needs to be merged by a maintainer. This leads to higher quality work.
  3. Branches are very cheap, and merging works fairly well. Again, this is a big improvement on the previous generation of tools (example Subversion) where branching and merging was painful. This allows code changes to be easily integrated from multiple sources.

Metadata

Git adds the following metadata to your source code, design files, and documentation:

  • commit messages
  • history of every change
  • tags for various releases
  • pull requests (to group commits)
  • conversations in pull requests

This metadata is simple, but allows for powerful workflow processes.

Parts of a Git System

The block diagram below shows the various parts of a Git system.

Git system block diagram

Typically there is a Git server and developers local workspace. But, there can also be additional remotes (shown in blue). While this may seem a bit complicated, it is very powerful and allows you to do flows like:

  1. clone an open source project
  2. modify the source code
  3. push the source code to your own Git server (a second remote) and collaborate with several developers.
  4. continue to fetch changes from the original (OSS) repository and merge with your changes.
  5. once finished, send a pull request to the upstream OSS maintainer to pull your changes from your git server.

The Pull Request Flow

Git pull requests are central to effective collaboration (as mentioned before). The basic flow of a pull request is shown below.

Git Pull Request Flow

Several points to note:

  1. The branching, development, and commits typically happen in your local Git workspace.
  2. The pull request is created it Github/Gitea/Gitlab, and discussion happens in that interface.
  3. If active development is happening on the master, it is recommended to git fetch and git merge origin/master occasionally to prevent merge headaches at the end.

Most Git hosting applications/providers offer an option to protect the master/main branch from direct commits. This encourages developers to work on branches and use pull requests to merge their work.

Workspace and remote repos

Git is a powerful tool, and like most powerful tools it is easy to shoot yourself in the foot and get things in a mess if you don't understand how it works. The most fundamental concept to understand is Git has a full repository in your cloned workspace. You can also have one or more remote repositories specified. The state of your local repository may or may not match the remote repository.

The following commands change the state of your local repo:

  • commit
  • merge
  • add

The following commands interact with remotes:

  • push
  • pull
  • fetch

Remotes may be changed by other people while you are working, so this is why the state of the remote may not match your local repo. A pull or merge can be used to combine the changes in a remote repo with your local repo. After a local repo is fully merged with a remote, you can then push your local repo branch to the remote.

Branches

Another key concept of Git is branches. This is one thing Git does extremely well and really differentiates it from previous version control systems like Subversion. When you create a branch (git checkout -b <new branch> or git switch -C <new branch>), it exists in your local repo, but not in the remote. To push your branch from the local repo to the remote, you need to do something like:

git push -u origin HEAD

The -u option tells git to connect your local branch with the remote so that on future pull/push operations, git knows which remote to synchronize the local branch to. Multiple remotes can exist, so Git needs to know which origin should be used.

Working with branches

If you are working on a branch, please merge origin/master before doing work. Otherwise, you may end up with a messy PR where tons of files are in the PR that did not really change. This makes things difficult to review.

Also, you should only commit the files you are working on changing. Again, if you commit everything that happens to be touched, this makes the PR hard to review. One flow is:

  • git pull
  • git merge origin/master
  • do your work
  • git add <only files you intended to change>
  • git commit
  • git push

Now you may be left with some files that got changed by the process of doing work, but you don't want to commit. At this point (once everything is safely committed), do a:

git reset --hard

This resets your workspace.

If you branch has not been worked on in awhile, it is cleanest to just reset it to the current master before doing work. Again, the goal is to minimize unnecessary merges:

  • git checkout <your branch>
  • git reset --hard origin/master (this unconditionally forces your branch to match the master branch)
  • git push -f (force push your branch)

Force push should be disabled on the master branch in your remote repos.

Again, the goal is to have commits/PRs that are easy to review.

If you only want to add a few of your commits to a PR, you can create a new branch, and then cherry-pick the few commits you want in the PR:

  • git checkout -b <new branch> origin/master
  • git cherry-pick <hash>
  • git cherry-pick <another hash>
  • git push -u origin HEAD
  • (click link to create new PR)
  • etc

Getting change notifications

With Github, Gitlab, and Gitea, you can configure whether you receive email notifications on repository activity. This is done by clicking the watch or bell button on the repository home page. Different hosting services/applications offer different levels of watching, but if select the highest level, you will receive emails on commits to pull requests. You can protect the master/main branch so changes cannot be pushed to it. This forces all developers to work on branches and merge changes using pull requests (use the PR flow described above in the Working with branches section). It is important to create the PR before committing and changes to the branch. Then anyone watching the repo will receive email notifications for every commit.

Rewriting history

Git is such a powerful tool that it even allows you to rewrite history -- well, at least in your Git repository. This is very handy as you can do work in your local repo, remove commits, merge commits, etc. This process can be used to clean up your commits before you push them to a remote. The following command can be used to:

git rebase -i HEAD~10

What this command does is an interactive rebase of the last 10 commits. You can delete, merge, re-order, and edit various commits.

You can also throw away the last X commits.

  • git log
  • note hash you want to return to
  • git reset --hard <hash to return to>

This throws away any commits after <hash>. Be very careful with this as you can throw away useful work.

After your local commit history looks clean, you can then push it to the remote repo for review and merging.

Recovering lost commits

Occasionally you may lose a commit. This may happen if you commit a change when you are not on any branch (common when using submodules). Git never throws anything away, so fortunately you can usually recover the commit by typing git reflog and then cherry-picking the lost commits after you switch to a branch.