On the subject of Git

Published in Technology

Ville Saalo

Senior Software Architect

Ville has been programming since primary school. He started using Git in 2012, and in 2022, he pushed approximately 1250 commits to different repositories. Ville's favourite pet is a cat.

March 22, 2023 · 8 min read time

Git is a popular piece of software among developers. But what is it? This time we are going back to basics and taking a look at what Git is and how it is used. The first part of the post will be for you if you do not know what Git is, and you can safely skip to the second part if you have already used Git.

Background

So, what is Git? It is a version control application that allows you to track changes to your files. Word and other Office suite tools also have such functionality built-in, but those are limited to the Office file formats. A general-purpose version control application allows the tracking of changes over any type of file, such as .java, .ts, or .jsx source files that developers work with. These files are just plain text with no formatting, although editors do colour the text, i.e. add syntax highlighting to them.

Git was invented by Linus Torvalds (yes, the one that invented Linux, too) in 2005, because he was not happy with any of the existing version control systems. Bitkeeper, CSV, SVN, Monotone and others all had their own problems: some were too slow, some were proprietary, some were too centralised, and so on. Git scales well to large projects and a large number of developers. However, companies and individuals who work with massive binary files, such as 3D models, textures, and uncompressed audio, may want to steer clear of Git.

In practice, Git maintains all versions of all the files it tracks in a repository. When you want to save a file, you commit it into the version control with a helpful comment as metadata. With Git, this only stores the file in your local repository, so you still need to push it to the remote repository. That way, other developers working on the project can access and see your changes by pulling the data from the repository. The repository may be hosted on any server, and there are popular hosting services for Git, such as GitHub, where Nitor also hosts several open-source projects.

When you want to participate in a project that you do not have write-access to, you can still create a copy of the repository for yourself by forking it. You can then make changes to this fork and create a pull request. This allows you to merge any changes from one branch of code to another, either within the same repository or between two forks of the same repository.

Version control repositories can be visualized with many tools. Here is a still image of the nFlow process engine repository visualized with Gource, which actually creates videos of the repository being edited over time.

Usage and best practices

OK, enough of the theory; let’s get to the practice. I will focus on command line interface (CLI) usage here because the CLI version of Git works consistently everywhere and always has all the features available. I still do use the Git browser and annotation features in IntelliJ Idea daily, as well as the good old gitk application.

Micro-commits and commit messages

First, I would like to discuss micro-commits and commit messages. In essence, it is a good practice to limit the scope of your commits, so that each one only does a single logical thing. If you cannot succinctly describe what your commit is about in your commit message, it is fair to assume there is too much going on.

When implementing a new feature, I often find myself splitting it into multiple commits so that the first commit only applies some cosmetic changes to the file. After that, I create one commit where I fix a typo and another where I do a refactoring that will help me with the new feature. Finally, I add the feature and the tests for it. Each of these commits are supported by a commit message that is of a certain format, like prefixed with a keyword such as “refactor:”, “feature:”, etc. The Conventional Commits specification is a nice idea for standardized commit messages. If you work with an issue tracker such as Jira, you will also want the issue ID in the commit message.

To enforce some commit message standard, you may want to set up a pre-commit hook, a piece of script that is executed automatically when you commit and can stop it if rules are not followed. Pre-commit hooks can also be used to run linters or other such tools that enforce code style or quality. However, similar results can be achieved with continuous integration pipelines that are automatically triggered for pull requests. Also, pre-commit hook scripts may cause issues if the developers in the team are using different operating systems.

But anyway, wait, why is it a good practice to do micro-commits? I’m glad you asked!

Benefits of micro-commits

Code reviews become easier. If I can trust that the author is only formatting a file when they say so, I can essentially skip reviewing that commit. It is also much easier for the reviewer when there are no unrelated changes, such as renaming of variables, taking place in a file within the commit that actually adds a feature or fixes a bug.
Merges become easier. Possible conflicts are easier to solve when the scope is limited.
Debugging becomes easier, too. Sometimes you find out about a regression bug and must find the culprit with git bisect. Few things are more discouraging than when bisect lands on a mega-commit that touches everything and does everything at once.
Easier to revert and cherry-pick single commits.

Isn’t it laborious to do all of these separate commits, then? It certainly requires discipline, but being able to add only parts of a file into the staging area at once helps. Instead of “git add -u”, do “git add -u -p”, and Git will ask you about staging each separate changed chunk in your files. This may help you separate different types of changes from each other so there is no need to go back to your editor and revert anything there.

Merging and reordering commits

All that being said, some projects still prefer a squash merge. This literally means squashing all your feature branch commits together into one big commit before merging it into the main branch and deleting the feature branch. This works too, but it is much more difficult to see what has actually been done later on. I believe the main motivation behind a squash merge is that if you have not structured your changes into micro-commits, then one big jumble of changes is better than multiple big jumbles of changes. However, to clear up those messy commits, Git does allow you to rewrite the history and I, for one, do this shamelessly all the time in my feature branches. Commits can be reordered, merged, and removed in the history freely when you are working on your branch alone.

Let’s say you want to work on issue #123, which adds a thingamabob into the application. You would create a new branch from the main branch with git checkout -b 123-thingamabob and start making changes into it. Sometime later, you notice that you have made an error a few commits back in commit 1a2bcf3. What to do? You create a fix and commit it with git commit --fixup=1a2bcf3. After that, you would reorder and squash the fix with git rebase -i --autosquash main, rebasing your branch on top of the main branch while at it. This even opens up your default text editor and shows you instructions on what else can be done with your commit history.

Closing the editor merges the fix commit into the previous commit as if the original commit had always been correct in the first place. Brilliant. If you had that original faulty commit pushed to the remote repository, you would just push your new version of the history over it with git push --force. However, you should protect your main development branch from force pushes since those will overwrite the commit history, and you might end up losing code by accident!

Aliases

After using Git for a while, you will notice that you repeat a certain set of commands over and over again – and that many of them are actually rather slow to type or hard to remember. It’s time to add shell aliases for those commands. There is a nice collection of aliases at the Oh My Zsh project, for starters, although I have customized some of them. For example, “asd” is easier to type than “gds” which stands for “git diff --staged” which I use all the time to tell the difference between the staged and unstaged changes. I also have an “nb” alias for “new branch”, i.e. “git checkout -b “ (actually “git switch -c” in the latest versions of Git!), “gitundo” for “git reset HEAD~”, etc. Come up with ones that suit your workflows the best.

Fun with Git

Git commits are identified by an SHA-1 hashcode, which is a 40 hex digit long string. The hash is calculated over the contents of the changed files, the hash of the previous commit, and some more metadata. There are some applications, such as gitbrute (source), that allow you to change that metadata, like the timestamps, by brute force and to find a given prefix for your commit hash. So if you wanted, you could, for example, prefix each commit with the relevant issue tracker id.😃

Another fun feature is related to GitHub, more specifically the contribution graph that shows your activity visually. Someone figured out that by carefully crafting out commits to certain dates, you can use this graph as a canvas for art. Art can be created into a new repository, and it can be removed by just deleting that repository. Online generators, like this one, exist, and the results can be pretty nice:

Conclusions

By now, you should know what Git is if you did not know that before. If you did, I hope you learned a new trick or two. The one thing to remember is that you can actually only learn these things thoroughly by doing, so read those man pages and google for Stack Overflow questions and answers. If you can think of it, Git can probably do it.

PS. What kind of name is “Git” anyway? Read the story behind it here.

Ville Saalo

Senior Software Architect

Ville has been programming since primary school. He started using Git in 2012, and in 2022, he pushed approximately 1250 commits to different repositories. Ville's favourite pet is a cat.