Git
Contents
- Metadata
- Parts of a Git System
- The Pull Request Flow
- Workspace and remote repos
- Branches
- Working with branches
- Getting change notifications
- Rewriting history
- Recovering lost commits
The impact Git has had on our world is significant. This tool has enabled a level of collaboration that was not possible before and has changed how we work. There are several key reasons:
- Git is distributed. Any repository can pull and merge from any other repository. Thus, changes can flow in any route. This allows communities to self-organize as needed.
- Git is typically used in a pull (vs push) model where upstream maintainers pull changes from contributors, rather than a select group having write access to the repo who push changes. This allows anyone to easily make changes and propose they be included in the upstream versions, yet give maintainers control of what goes into the repo they maintain. The pull flow also encourages code review and collaboration as each pull request needs to be merged by a maintainer. This leads to higher quality work.
- Branches are very cheap, and merging works fairly well. Again, this is a big improvement on the previous generation of tools (example Subversion) where branching and merging was painful. This allows code changes to be easily integrated from multiple sources.
Metadata
Git adds the following metadata to your source code, design files, and documentation:
- commit messages
- history of every change
- tags for various releases
- pull requests (to group commits)
- conversations in pull requests
This metadata is simple, but allows for powerful workflow processes.
Parts of a Git System
The block diagram below shows the various parts of a Git system.
Typically there is a Git server and developers local workspace. But, there can also be additional remotes (shown in blue). While this may seem a bit complicated, it is very powerful and allows you to do flows like:
- clone an open source project
- modify the source code
- push the source code to your own Git server (a second remote) and collaborate with several developers.
- continue to fetch changes from the original (OSS) repository and merge with your changes.
- once finished, send a pull request to the upstream OSS maintainer to pull your changes from your git server.
The Pull Request Flow
Git pull requests are central to effective collaboration (as mentioned before). The basic flow of a pull request is shown below.
Several points to note:
- The branching, development, and commits typically happen in your local Git workspace.
- The pull request is created it Github/Gitea/Gitlab, and discussion happens in that interface.
- If active development is happening on the master, it is recommended to
git fetch
andgit merge origin/master
occasionally to prevent merge headaches at the end.
Most Git hosting applications/providers offer an option to protect the
master/main
branch from direct commits. This encourages developers to work on
branches and use pull requests to merge their work.
Workspace and remote repos
Git is a powerful tool, and like most powerful tools it is easy to shoot
yourself in the foot and get things in a mess if you don't understand how it
works. The most fundamental concept to understand is Git has a full repository
in your cloned workspace. You can also have one or more remote
repositories
specified. The state of your local repository may or may not match the remote
repository.
The following commands change the state of your local repo:
commit
merge
add
The following commands interact with remotes:
push
pull
fetch
Remotes may be changed by other people while you are working, so this is why the
state of the remote may not match your local repo. A pull
or merge
can be
used to combine the changes in a remote repo with your local repo. After a local
repo is fully merged with a remote, you can then push your local repo branch to
the remote.
Branches
Another key concept of Git is branches. This is one thing Git does extremely
well and really differentiates it from previous version control systems like
Subversion. When you create a branch (git checkout -b <new branch>
or
git switch -C <new branch>
), it exists in your local repo, but not in the
remote. To push your branch from the local repo to the remote, you need to do
something like:
git push -u origin HEAD
The -u
option tells git to connect your local branch with the remote so that
on future pull/push
operations, git knows which remote to synchronize the
local branch to. Multiple remotes can exist, so Git needs to know which origin
should be used.
Working with branches
If you are working on a branch, please merge origin/master
before doing work.
Otherwise, you may end up with a messy PR where tons of files are in the PR that
did not really change. This makes things difficult to review.
Also, you should only commit the files you are working on changing. Again, if you commit everything that happens to be touched, this makes the PR hard to review. One flow is:
git pull
git merge origin/master
- do your work
git add <only files you intended to change>
git commit
git push
Now you may be left with some files that got changed by the process of doing work, but you don't want to commit. At this point (once everything is safely committed), do a:
git reset --hard
This resets your workspace.
If you branch has not been worked on in awhile, it is cleanest to just reset it to the current master before doing work. Again, the goal is to minimize unnecessary merges:
git checkout <your branch>
git reset --hard origin/master
(this unconditionally forces your branch to match the master branch)git push -f
(force push your branch)
Force push should be disabled on the master branch in your remote repos.
Again, the goal is to have commits/PRs that are easy to review.
If you only want to add a few of your commits to a PR, you can create a new branch, and then cherry-pick the few commits you want in the PR:
git checkout -b <new branch> origin/master
git cherry-pick <hash>
git cherry-pick <another hash>
git push -u origin HEAD
- (click link to create new PR)
- etc
Getting change notifications
With Github, Gitlab, and Gitea, you can configure whether you receive email notifications on repository activity. This is done by clicking the watch or bell button on the repository home page. Different hosting services/applications offer different levels of watching, but if select the highest level, you will receive emails on commits to pull requests. You can protect the master/main branch so changes cannot be pushed to it. This forces all developers to work on branches and merge changes using pull requests (use the PR flow described above in the Working with branches section). It is important to create the PR before committing and changes to the branch. Then anyone watching the repo will receive email notifications for every commit.
Rewriting history
Git is such a powerful tool that it even allows you to rewrite history -- well, at least in your Git repository. This is very handy as you can do work in your local repo, remove commits, merge commits, etc. This process can be used to clean up your commits before you push them to a remote. The following command can be used to:
git rebase -i HEAD~10
What this command does is an interactive rebase of the last 10 commits. You can delete, merge, re-order, and edit various commits.
You can also throw away the last X commits.
git log
- note hash you want to return to
git reset --hard <hash to return to>
This throws away any commits after <hash>
. Be very careful with this as you
can throw away useful work.
After your local commit history looks clean, you can then push it to the remote repo for review and merging.
Recovering lost commits
Occasionally you may lose a commit. This may happen if you commit a change
when you are not on any branch (common when using submodules). Git never throws
anything away, so fortunately you can usually recover the commit by typing
git reflog
and then cherry-picking
the lost commits after you switch to a
branch.