Introduction
In this post, we will take a look at Git's data model
The data model of Git is different from other common version control systems (VCSs) in the way Git handles its data. Traditionally, a VCS will store its data as an initial file, followed by a list of patches for each new version of the file:
Git is different: Instead of the regular file and patches list, Git records a snapshot of all the files tracked by Git and their paths relative to the repository root—that is, the files tracked by Git in the filesystem tree. Each commit in Git records the full tree state. If a file does not change between commits, Git will not store the file again. Instead, Git stores a link to the file. This is shown in the diagram below where you see how the files will be after every commit/version.
The way Git references files and directories is directly built into the data model. In short, the Git data model can be summarized as shown in the following diagram:
The commit object points to the root tree. The root tree points to subtrees and files. Branches and tags point to a commit object and the HEAD object points to the branch that is currently checked out. So, for every commit, the full tree state and snapshot are identified by the root tree.
Git's objects
There are four types of objects in Git. The four object types are as follows:
- Files, or blobs as they are also called in the Git context
- Directories, or trees in the Git context
- Commits
- Tags
To view the objects in the Git database, we first need a repository to be examined. For this recipe, we will clone an example repository in the following location
The commit object
The Git's special HEAD object always points to the current snapshot/commit, so we can use that as the target for our request of the commit that we want to have a look at:
This is the subject line of the commit message. It should be followed by a blank line and then the body, which is this text. Here, you can use multiple paragraphs to explain your commit. It's like an email with a subject and a body to try to attract people's attention to the subject.
The cat-file command with the -p option prints the object given on the command line; in this case, HEAD, points to master, which, in turn, points to the most recent commit on the branch.
We can now see the commit object, consisting of the root tree (tree), the parent commit object's ID (parent), the author and timestamp information (author), the committer and timestamp information (committer), and the commit message.
The tree object
To see the tree object, we can run the same command on the tree, but with the tree ID (b10410e60c633997adcebc0f3f36be552816159b) as the target:
We can also specify that we want the tree object from the commit pointed to by HEAD by specifying git cat-file -p HEAD^{tree}, which would give the same results as the previous command. The special notation HEAD^{tree} means that from the reference given, HEAD recursively dereferences the object at the reference until a tree object is found.
The first tree object is the root tree object found from the commit pointed to by the master branch, which is pointed to by HEAD. A generic form of the notation is <rev>^<type>, and will return the first object of <type>, searching recursively from <rev>.
From the tree object, we can see what it contains: the file type/permissions, type (tree/blob), ID, and pathname:
Type/Permissions | Type | ID/SHA-1 | Pathname |
---|---|---|---|
100644 | blob | c4ba002b551ac221671c93919cff9e561a33dc18 | README.md |
040000 | tree | 84abc91594f48e92bcf80c042ed5d3bf47d0acec | new_folder |
100644 | blob | 9daeafb9864cf43055ae93beb0afd6c7d144bfa4 | text-file.txt |
The blob object
Now, we can investigate the blob (file) object. We can do this using the same command, giving the blob ID as the target for the text-file.txt file:
The content of the file is text-file.txt
This is simply the content of the file, which we can also get by running a normal cat text-file.txt command.
The branch object
The branch object is not really like any other Git objects; you can't print it using the cat-file command as we can with the others
we can take a look at the branch inside the .git folder where the whole Git repository is stored. If we open the text file .git/refs/heads/master, we can actually see the commit ID that the master branch points to. We can do this using cat, as follows:
We can verify that this is the latest commit by running git log -1:
We can also see that HEAD is pointing to the active branch by using cat with the .git/HEAD file:
The branch object is simply a pointer to a commit, identified by its SHA-1 hash.
The tag object
The last object to be analyzed is the tag object. There are three different kinds of tag: a lightweight (just a label) tag, an annotated tag, and a signed tag. In the example repository, there are two annotated tags:
Let's take a closer look at the v1.0 tag:
As you can see, the tag consists of an object—which, in this case, is the latest commit on the master branch—the object's type (commits, blobs, and trees can be tagged), the tag name, the tagger and timestamp, and finally the tag message.