Introduction
In this post, we will take a look at Git's data model.
The data model of Git is different from other common version control systems (VCSs) in the way Git handles its data. Traditionally, a VCS will store its data as an initial file, followed by a list of patches for each new version of the file:
Git is different: Instead of the regular file and patches list, Git records a snapshot of all the files tracked by Git and their paths relative to the repository root—that is, the files tracked by Git in the filesystem tree. Each commit in Git records the full tree state. If a file does not change between commits, Git will not store the file again. Instead, Git stores a link to the file. To optimize storage, Git also compresses and stores objects in packfiles, reducing redundancy and improving performance. This is shown in the diagram below where you see how the files will be after every commit/version.
The way Git references files and directories is directly built into the data model. In short, the Git data model can be summarized as shown in the following diagram:
The commit object points to the root tree. The root tree points to subtrees and files. Branches and tags point to a commit object, and the HEAD object points to the branch that is currently checked out. In cases where Git is in a detached HEAD state, HEAD will point directly to a commit rather than a branch. For every commit, the full tree state and snapshot are identified by the root tree.
Git's objects
There are four types of objects in Git:
- Blobs (files)
- Trees (directories)
- Commits
- Tags
To view the objects in the Git database, we first need a repository to examine. For this, we will clone an example repository:
The commit object
Git's special HEAD object always points to the current snapshot/commit, so we can use it as the target to inspect the latest commit:
This is the subject line of the commit message. It should be followed by a blank line and then the body, which explains the commit. It's like an email with a subject and a body to attract people's attention to the subject.
The cat-file
command with the -p
option prints the object given on the command line. In this case, HEAD
points to master
(or main
), which, in turn, points to the most recent commit on the branch.
We can now see the commit object, consisting of the root tree (tree), the parent commit object ID (parent), the author and timestamp information (author), the committer and timestamp information (committer), and the commit message.
The tree object
To see the tree object, we can run:
Alternatively, we can use git cat-file
with the tree ID:
We can also specify that we want the tree object from the commit pointed to by HEAD
by running:
The special notation HEAD^{tree}
means that from the reference given, HEAD
recursively dereferences the object at the reference until a tree object is found.
A generic form of this notation is <rev>^<type>
, which returns the first object of <type>
, searching recursively from <rev>
.
From the tree object, we can see its contents: the file type/permissions, type (tree/blob), ID, and pathname:
To investigate the blob (file) object, we can use:
or alternatively:
The content of text-file.txt
is displayed:
This is simply the content of the file, which we can also get by running:
The branch object
The branch object is not really like any other Git objects; you can't print it using git cat-file
.
To inspect a branch inside the .git
folder, we can check the refs/heads/main
file:
However, a more general approach to list references is:
To verify that this is the latest commit, we can run:
We can also see that HEAD
is pointing to the active branch by using:
The branch object is a reference to a commit, identified by its hash (SHA-1 by default, but Git also supports SHA-256).
The tag object
The last object to analyze is the tag object. There are three types of tags:
- Lightweight tag (just a label)
- Annotated tag
- Signed tag
To list all tags in the repository:
To take a closer look at the v1.0
tag, we can use:
This will display:
- The object (commit, blob, or tree) being tagged
- The object's type
- The tag name
- The tagger and timestamp
- The tag message