What is Git Object

Introduction

In this post, we will take a look at Git's data model.

The data model of Git is different from other common version control systems (VCSs) in the way Git handles its data. Traditionally, a VCS will store its data as an initial file, followed by a list of patches for each new version of the file:

Git is different: Instead of the regular file and patches list, Git records a snapshot of all the files tracked by Git and their paths relative to the repository root—that is, the files tracked by Git in the filesystem tree. Each commit in Git records the full tree state. If a file does not change between commits, Git will not store the file again. Instead, Git stores a link to the file. To optimize storage, Git also compresses and stores objects in packfiles, reducing redundancy and improving performance. This is shown in the diagram below where you see how the files will be after every commit/version.

The way Git references files and directories is directly built into the data model. In short, the Git data model can be summarized as shown in the following diagram:

The commit object points to the root tree. The root tree points to subtrees and files. Branches and tags point to a commit object, and the HEAD object points to the branch that is currently checked out. In cases where Git is in a detached HEAD state, HEAD will point directly to a commit rather than a branch. For every commit, the full tree state and snapshot are identified by the root tree.

Git's objects

There are four types of objects in Git:

Blobs (files)
Trees (directories)
Commits
Tags

To view the objects in the Git database, we first need a repository to examine. For this, we will clone an example repository:

git clone https://github.com/devtutorialio/git-tutorial.git
cd git-tutorial

The commit object

Git's special HEAD object always points to the current snapshot/commit, so we can use it as the target to inspect the latest commit:

git cat-file -p HEAD

This is the subject line of the commit message. It should be followed by a blank line and then the body, which explains the commit. It's like an email with a subject and a body to attract people's attention to the subject.

The cat-file command with the -p option prints the object given on the command line. In this case, HEAD points to master (or main), which, in turn, points to the most recent commit on the branch.

We can now see the commit object, consisting of the root tree (tree), the parent commit object ID (parent), the author and timestamp information (author), the committer and timestamp information (committer), and the commit message.

The tree object

To see the tree object, we can run:

git ls-tree HEAD

Alternatively, we can use git cat-file with the tree ID:

git cat-file -p a4dff7886e4f47e053814b5ebc630dc4862550bd

We can also specify that we want the tree object from the commit pointed to by HEAD by running:

git cat-file -p HEAD^{tree}

The special notation HEAD^{tree} means that from the reference given, HEAD recursively dereferences the object at the reference until a tree object is found.

A generic form of this notation is <rev>^<type>, which returns the first object of <type>, searching recursively from <rev>.

From the tree object, we can see its contents: the file type/permissions, type (tree/blob), ID, and pathname:

Type/Permissions   Type     ID/SHA-1                                    Pathname
100644             blob     d5150a516b4047a78d8d8a72c6e2534e6113238f	README.md
040000             tree     84abc91594f48e92bcf80c042ed5d3bf47d0acec	new_folder
100644             blob     d24c974270585e981e4bbeb3fe977b838e3f8816	text-file.txt

To investigate the blob (file) object, we can use:

git show d24c974270585e981e4bbeb3fe977b838e3f8816

or alternatively:

git cat-file -p d24c974270585e981e4bbeb3fe977b838e3f8816

The content of text-file.txt is displayed:

test
Another line
Another line 1

This is simply the content of the file, which we can also get by running:

cat text-file.txt

The branch object

The branch object is not really like any other Git objects; you can't print it using git cat-file.

To inspect a branch inside the .git folder, we can check the refs/heads/main file:

cat .git/refs/heads/main

However, a more general approach to list references is:

git show-ref

To verify that this is the latest commit, we can run:

git log -1

We can also see that HEAD is pointing to the active branch by using:

cat .git/HEAD

The branch object is a reference to a commit, identified by its hash (SHA-1 by default, but Git also supports SHA-256).

The tag object

The last object to analyze is the tag object. There are three types of tags:

- Lightweight tag (just a label)

- Annotated tag

- Signed tag

To list all tags in the repository:

git tag

To take a closer look at the v1.0 tag, we can use:

git show v1.0

This will display:

- The object (commit, blob, or tree) being tagged

- The object's type

- The tag name

- The tagger and timestamp

- The tag message

Ubuntu

Fedora

CentOS

Debian

Rocky Linux

DevOps

Database

AI/ML

Other