Git如何存储分支机构？

Question

Git如何存储分支机构？

假设我有两个分支:master和dev.第一个包含带有内容的名为"1.txt"的文件

"Hello, world"

Run Code Online (Sandbox Code Playgroud)

第二个包含带有内容的文件"1.txt"

"Goodbye, world!!"

Run Code Online (Sandbox Code Playgroud)

git将在何处以及如何存储文件的不同副本？我的意思是,在.git文件夹中的确切位置？

Answer 1

tor*_*rek 14

Git并不完全存储文件.Git存储的是对象.

分支也不包含文件.分支名称,如master或dev,存储提交哈希ID.

理解这一点的关键有点循环:你只有在理解它时才真正理解它.:-)但是要开始,请考虑将Git存储为提交对象并以提交的概念为中心.

提交是这些Git对象之一.有四种对象:提交,树,blob和标记.树和blob用于构建提交.标记对象用于带注释的标记,但不要担心这些.

所以Git就是存储提交,并提交为你保存文件(通过那些树和blob对象).但提交不是文件本身:它更像是一个包装器.提交的内容是:您的姓名(作为作者),您的电子邮件地址以及您提交的时间; 提交的父提交的哈希ID ; 你的提交日志消息; 以及记住哪些文件进入提交的树对象的哈希ID.

所以你可能会认为树对象包含你的文件 - 但它也没有!相反,树对象包含文件的名称以及blob对象的哈希ID.这些blob对象可以保存您的文件.

所有Git对象都由一个丑陋的哈希ID命名

提交的名称或任何其他Git对象被写为40个字符的哈希ID d35688db19c9ea97e9e2ce751dc7b47aee21636b.您可能已经在git log输出中看到了它们,或者在运行其他Git命令时显示的缩短版本.

这些哈希ID对于人类来说不可能以任何实际的方式使用,因此Git提供了一种将简短有意义的名称转换为大丑陋哈希ID的方法.这些名称有多种形式,但您使用的第一个名称是分支名称.

这意味着如果您有两个分支名称,master并且dev这些名称实际存储了哈希ID.

Git使用哈希ID来查找提交对象.然后,每个提交对象都存储树ID.Git使用它来查找树对象.树对象包含(以及其他内容)名称,例如1.txt与blob哈希ID配对.Git使用blob哈希ID来查找blob对象,blob对象存储文件的完整内容.

这就是Git存储文件的方式

git将在何处以及如何存储文件的不同副本？我的意思是,在.git文件夹中的确切位置？

当你运行git add 1.txt然后提交它时,Git会生成一个blob来保存所有内容1.txt.新blob有一些哈希ID.让我们先说吧1234567....Git .git/objects/12/34567...以压缩形式存储实际内容,以及将对象类型标识为blob的一些前端位.

如果你再改1.txt,并git add和git commit再次,你会得到一个新斑块,有一个新的ID.让我们先说吧fedcba9....这个对象进入.git/objects/fe/dcba9....

当然,为了存储这些blob,Git也必须编写树对象并提交对象.如果你在分支上dev,当Git写出新的提交时,Git将更改名称dev以存储新的提交哈希ID.

提交形成一个链

为了找到那个承诺是在dev所有之前的这一点,写的Git与之前的新的提交dev尖提交ID作为其母公司.

假设代替大丑陋的哈希ID,我们给每个提交一个字母,从头开始A计数.这画得更容易,但当然我们在26次提交后就会用完字母.:-)

让我们从只有一个提交的存储库开始:

A   <-- master

Run Code Online (Sandbox Code Playgroud)

分支名称master存储,A以便我们知道提交已命名A.

这不是很有趣,所以让我们做一个新的提交B:

A <-B   <-- master

Run Code Online (Sandbox Code Playgroud)

现在名称master存储了这封信B.提交本身,即B对象,在其中包含提交的ID A.

要进行另一个新的提交master,我们为它分配一个新的哈希C,用适当的日志消息和树写一个提交对象,等等,make C的父级是B:

A <-B <-C

Run Code Online (Sandbox Code Playgroud)

然后我们写C为master:

A <-B <-C   <-- master

Run Code Online (Sandbox Code Playgroud)

这意味着分支名称,例如master,只是指向分支的提示提交.从某种意义上说,分支本身就是从最新开始并向后工作的提交链.

请注意,Git的内部箭头都指向后方.从最新开始,Git始终向后运行所有内容.

我们可以通过创建一个新分支来使这更有趣dev.最初,dev指向相同的提交master:

A--B--C   <-- dev (HEAD), master

Run Code Online (Sandbox Code Playgroud)

我们添加了这个有趣的符号,(HEAD)以记住我们正在使用的分支名称.

现在让我们像往常一样进行新的提交.新提交一如既往地获取其作者和日志消息,并存储当前提交的哈希ID C作为其父级,但现在我们必须更新分支名称以指向D.我们应该更新哪个分支名称？那是HEAD进来的地方:它告诉我们要更新哪一个!

A--B--C   <-- master
       \
        D   <-- dev (HEAD)

Run Code Online (Sandbox Code Playgroud)

所以现在dev识别提交D,同时master仍然识别C.

这就是分支增长的方式

这是理解Git的第一个主要秘密.Git不存储文件,它存储提交.提交形成链.这些链是 Git存储库中的历史.

Git使用分支名称来记住最新或提示提交.这些提交提交让我们找到旧的提交.如果我们添加一个新的提交E,master我们得到:

A--B--C--E   <-- master
       \
        D   <-- dev

Run Code Online (Sandbox Code Playgroud)

我们现在可以在视觉上看到master并dev加入提交C.

运行git checkout <branch-name>告诉Git在分支的尖端提取提交,使用提交查找树来查找blob以获取所有文件.然后,作为git checkout分支名称的最后一步,Git将附加HEAD到该分支名称,以便它在我们添加新提交时知道要更新的分支名称.

我已经使用 git 将近十年了，这个答案在我脑海中点亮了一个我什至不知道存在的灯泡。很高兴我偶然发现了它。谢谢@torek。 (3认同)

Answer 2

JDB*_*JDB 6

Torek has an excellent answer that I'm not going to try and replicate... but if it's still confusing to you, then let me try to demonstrate how it works with Javascript. I'm going to simplify things a bit, so this isn't an exact implementation of Git in JS, but it's close enough to understand some of the fundamentals.

Create a File

A file is made up of two distinct parts: the actual contents of the file; and the metadata about that file (it's name and mode). Let's define the contents and store them so that we can reference them later:

allTheThings['06f19763'] = "blob " + "Hello, world";

Run Code Online (Sandbox Code Playgroud)

The variable names here are the SHA1 hashes of the values. This is a really important concept going forward... everything in git is a SHA1 hash of something. You can generate these hashes yourself using any SHA1 tool you want (I used an online tool).

I truncated the hash value to the first 8 characters for brevity. When working in git, you can truncate as much as you want so long as git is still able to uniquely identify an object. Usually 8 characters is enough (the odds of two objects having the same first 8 commits are really, really small), so that's what you'll see in most examples and even in much of the documentation.

Create a Tree

Cool... so now we've got the contents. But we want the other half of the file now... it's name. To do that, we need to create a tree object that basically replicates a folder/directory.

allTheThings['5e91b67a'] = "tree " + "100644 blob 06f19763 file1.txt";

Run Code Online (Sandbox Code Playgroud)

This tree object says that the file contents referenced by 06f19763 (or "Hello, world") are named file1.txt and are read/writable (the 100644 is based on Unix modes -- this one means the file1.txt is a normal file).

In addition to files, trees can contain other trees, which is how we can create directories of arbitrary depth.

Create a Commit

Each commit contains a reference to a tree, representing the root directory of the repo. In our example, file1.txt is located in the root and is the only file in the repo. So let's create a commit:

allTheThings['a9d13be8'] =
    "commit\n" +
    "tree 5e91b67a\n" +
    "author JD <email> 1508777071\n" +
    "committer JD <email> 1508777071\n" +
    "\n" +
    "Commit message";

Run Code Online (Sandbox Code Playgroud)

The commit points to our tree, and includes some additional info like author of the commit and a commit message.

A branch is pretty much just a convenient name for a commit. When you update a branch, you're just creating a new commit then resetting the branch to point to it.

So where is everything?

All the things we've created so far are stored in the allTheThings object, so they're all stored together. We can tell what everything is based on the prefixes ("blob", "tree" and "commit"). Every entry is keyed off the Hash of the contents, which is virtually guaranteed to be unique. Whenever we change the contents of a file, a file name, a commit message, etc, we change the hash, but the original object is still there and can still be referenced by other objects (trees, commits, etc).

For example, if we update the file we end up with new hash ids up the entire chain:

allTheThings['3e103e35'] = "blob " + "Goodbye, world!!";
allTheThings['05abc8ab'] = "tree " + "100644 blob 3e103e35 file1.txt";
allTheThings['a5944bfa'] =
    "commit\n" +
    "tree 05abc8ab\n" +
    "author JD <email> 1508777071\n" +
    "committer JD <email> 1508777071\n" +
    "\n" +
    "Commit message";

Run Code Online (Sandbox Code Playgroud)

Notice how, even though the file name and commit message/author/etc did not change, the change to the contents of file1 caused a chain reaction the entire way up to the commit:

06f19763 => 3e103e35 (the contents changed...)
5e91b67a => 05abc8ab (so the content reference in the tree changed)
a9d13be8 => a5944bfa (so the tree reference in the commit changed )

Run Code Online (Sandbox Code Playgroud)

All six objects exist in our allTheThings object, happily living right next to each other:

allTheThings = {
    06f19763: "blob Hello, world",
    3e103e35: "blob Goodbye, world!!",
    5e91b67a: "tree 100644 blob 06f19763 file1.txt",
    05abc8ab: "tree 100644 blob 3e103e35 file1.txt",
    a9d13be8: "commit\ntree 5e91b67a\nauthor JD <email> 1508777071\ncommitter JD <email> 1508777071\n\nCommit message",
    a5944bfa: "commit\ntree 05abc8ab\nauthor JD <email> 1508777071\ncommitter JD <email> 1508777071\n\nCommit message",
}

Run Code Online (Sandbox Code Playgroud)

Finally, your master branch points to a9d13be8, while your dev branch points to a5944bfa.

In real git, these objects are stored in the .git directory as individual (compressed) files (.git/objects/12/34567... as Torek said), but it's the same concept.

Because a git repo can contain so many objects, the leading two characters of the hash are used to subdivide files into directories, to ensure the the maximum file count in a directory isn't exceeded (especially on older systems). It's tempting to think that these prefixes have more meaning than that, such as object type, but they don't.

And that's pretty much it. Files, trees, commits, and a few other things, are all considered Git Objects and are lumped together inside the objects directory. You can use plumbing commands to work directly with these objects and extract them for use, but it's almost always much easier to use the many porcelain commands to work with them indirectly.

归档时间：	8 年，1 月前
查看次数：	2269 次
最近记录：	8 年，1 月前