如何更新git浅层克隆？

Question

如何更新git浅层克隆？

Hib*_*u57 25 git git-clone

背景

(对于tl; dr,请参阅下面的#questions)

我有多个git存储库浅克隆.我使用的是浅克隆,因为它与脏克隆相比要小得多.每个人都克隆着做git clone --single-branch --depth 1 <git-repo-url> <dir-name>.

这工作正常,但我没有看到如何更新它.

当我用标签克隆时,更新没有意义,因为标签是冻结的时间点(据我所知).在这种情况下,如果我想更新,这意味着我想要通过另一个标签克隆,所以我只是rm -rf <dir-name>再次克隆.

当我克隆了主分支的HEAD然后想要更新它时,事情变得更加复杂.

我试过git pull --depth 1但是虽然我不想把任何东西推到远程存储库,它抱怨它不知道我是谁.

我试过了git fetch --depth 1,但是虽然它似乎更新了一些东西,但我检查它是不是最新的(远程存储库上的某些文件的内容与我克隆上的文件不同).

在/sf/answers/1435601401/之后,我试过git fetch --depth 1; git reset --hard origin/master,但有两件事:第一,我不明白为什么git reset需要,第二,尽管文件似乎是最新的,但仍有一些旧文件,以及git clean -df不会删除这些文件.

问题

让克隆创建git clone --single-branch --depth 1 <git-repo-url> <dir-name>.如何更新它以达到相同的效果rm -rf <dir-name>; git clone --single-branch --depth 1 <git-repo-url> <dir-name>？或者是rm -rf <dir-name>再次克隆唯一的方法？

注意

这不是如何在不增加主要仓库大小的情况下更新浅层克隆子模块的重复,因为答案不符合我的期望,而且我使用的是简单的存储库,而不是子模块(我不知道).

Answer 1

tor*_*rek 51

[稍微改写和格式化]给定一个克隆创建git clone --single-branch --depth 1 url directory,如何更新它以实现相同的结果rm -rf directory; git clone --single-branch --depth 1 url directory？

Note that --single-branch is the default when using --depth 1. The (single) branch is the one you give with -b. There's a long aside that goes here about using -b with tags but I will leave that for later. If you don't use -b, your Git asks the "upstream" Git—the Git at url—which branch it has checked-out, and pretends you used -b thatbranch. This means that it is important to be careful when using --single-branch without -b to make sure that this upstream repository's current branch is sensible, and of course, when you do use -b, to make sure that the branch argument you give really does name a branch, not a tag.

简单的答案基本上是这个,有两个小的变化:

在/sf/answers/1435601401/之后,我试过git fetch --depth 1; git reset --hard origin/master,但有两件事:第一,我不明白为什么git reset需要,第二,尽管文件似乎是最新的,但仍有一些旧文件,以及git clean -df不会删除这些文件.

两个微小的变化是:确保您使用origin/branchname,并添加-x(git clean -d -f -x或git clean -dfx)git clean步骤.至于为什么,这会变得有点复杂.

这是怎么回事

Without --depth 1, the git fetch step calls up the other Git and gets from it a list of branch names and corresponding commit hash IDs. That is, it finds a list of all the upstream's branches and their current commits. Then, because you have a --single-branch repository, your Git throws out all but the single branch, and brings over everything Git needs to connect that current commit back to the commit(s) you already have in your repository.

With --depth 1, your Git doesn't bother connecting the new commit to older historical commits at all. Instead, it obtains just the one commit and the other Git objects needed to complete that one commit. It then writes an additional "shallow graft" entry to mark that one commit as a new pseudo-root commit.

Regular (non-shallow) clone and fetch

These are all related to how Git behaves when you're using a normal (non-shallow, non-single-branch) clone: git fetch calls up the upstream Git, gets a list of everything, and then brings over whatever you don't already have. This is why an initial clone is so slow, and a fetch-to-update is usually so fast: once you get a full clone, the updates rarely have very much to bring over: maybe a few commits, maybe a few hundred, and most of those commits don't need much else either.

The history of a repository is formed from the commits. Each commit names its parent commit (or for merges, parent commits, plural), in a chain that goes backwards from "the latest commit", to the previous commit, to some more-ancestral commit, and so on. The chain eventually stops when it reaches a commit that has no parent, such as the first commit ever made in the repository. This kind of commit is a root commit.

That is, we can draw a graph of commits. In a really simple repository the graph is just a straight line, with all the arrows pointing backwards:

o <- o <- o <- o   <-- master

Run Code Online (Sandbox Code Playgroud)

The name master points to the fourth and latest commit, which points back to the third, which points back to the second, which points back to the first.

Each commit carries with it a complete snapshot of all the files that go in that commit. Files that are not at all changed are shared across these commits: the fourth commit just "borrows" the unchanged version from the third commit, which "borrows" it from the second, and so on. Hence, each commit names all the "Git objects" that it needs, and Git either finds those objects locally—because it already has them—or uses the fetch protocol to bring them over from the other, upstream Git. There's a compression format called "packing", and a special variant for network transfer called "thin packs", that allows Git to do this even better/fancier, but the principle is simple: Git needs all, and only, those objects that go with the new commits it's picking up. Your Git decides whether it has those objects, and if not, obtains them from their Git.

A more-complicated, more-complete graph generally has several points where it branches, some where it merges, and multiple branch names pointing to different branch tips:

        o--o   <-- feature/tall
       /
o--o--o---o    <-- master
    \    /
     o--o      <-- bug/short

Run Code Online (Sandbox Code Playgroud)

这里分支bug/short合并回来master,而分支feature/tall仍在进行开发.现在可以(可能)完全删除名称 bug/short:如果我们完成了对它的提交,我们就不再需要它了.提交master名称前两个提交的提示,包括提示的提示bug/short,因此通过获取master我们将获取bug/short提交.

请注意,简单和稍微复杂的图表都只有一个根提交.这很典型:所有拥有提交的存储库至少有一个root提交,因为第一次提交始终是root提交; 但是大多数存储库也只有一个root提交.但是,您可以使用此图表进行不同的根提交:

 o--o
     \
o--o--o   <-- master

Run Code Online (Sandbox Code Playgroud)

或者这个:

 o--o     <-- orphan

o--o      <-- master

Run Code Online (Sandbox Code Playgroud)

事实上,一个只用一个master通过合并很可能取得orphan到master,然后删除名字orphan.

移植和替换

Git很长一段时间(可能是不稳定的)对移植物的支持,它被替换为(更好,实际上是固体的)通用替代物的支持.为了具体地掌握它们,我们需要在上面添加每个提交都有自己唯一ID的概念.这些ID是丑陋的40个字符的SHA-1哈希,face0ff...依此类推.实际上,每个 Git对象都有一个唯一的ID,但是出于图形目的,我们所关心的只是提交.

对于绘制图形,那些大的哈希ID是太痛苦使用,所以我们可以用一个字母名称A通过Z代替.让我们再次使用这个图,但是输入一个字母的名字:

        E--H   <-- feature/tall
       /
A--B--D---G    <-- master
    \    /
     C--F      <-- bug/short

Run Code Online (Sandbox Code Playgroud)

Commit H refers back to commit E (E is H's parent). Commit G, which is a merge commit—meaning it has at least two parents—refers back to both D and F, and so on.

Note that the branch names, feature/tall, master, and bug/short, each point to one single commit. The name bug/short points to commit F. This is why commit F is on branch bug/short ... but so is commit C. Commit C is on bug/short because it is reachable from the name. The name gets us to F, and F gets us to C, so C is on branch bug/short.

Note, however, that commit G, the tip of master, gets us to commit F. This means that commit F is also on branch master. This is a key concept in Git: commits may be on one, many, or even no branches. A branch name is merely a way to get started within a commit graph. There are other ways, such as tag names, refs/stash (which gets you to the current stash: each stash is actually a couple of commits), and the reflogs (which are normally hidden from view as they are normally just clutter).

This also, however, gets us to grafts and replacements. A graft is just a limited kind of replacement, and shallow repositories use a limited form of graft.¹ I won't describe replacements fully here as they are a bit more complicated, but in general, what Git does for all of these is to use the graft or replacement as an "instead-of". For the specific case of commits, what we want here is to be able to change—or at least, pretend to change—the parent ID or IDs of any commit ... and for shallow repositories, we want to be able to pretend that the commit in question has no parents.

¹The way shallow repositories use the graft code is not shaky. For the more general case, I recommended using git replace instead, as that also was and is not shaky. The only recommended use for grafts is—or at least was, years ago—to put them in place just long enough to run git filter-branch to copy an altered—grafted—history, after which you should just discard the grafted history entirely. You can use git replace for this purpose as well, but unlike grafts, you can use git replace permanently or semi-permanently, without needing git filter-branch.

Making a shallow clone

To make a depth-1 shallow clone of the current state of the upstream repository, we will pick one of the three branch names—feature/tall, master, or bug/short—and translate it to a commit ID. Then we will write a special graft entry that says: "When you see that commit, pretend that it has no parent commits, i.e., is a root commit."

Let's say we pick master. The name master points to commit G, so to make a shallow clone of commit G, we obtain commit G from the upstream Git as usual, but then write a special graft entry that claims commit G has no parents. We put that into our repository, and now our graph looks like this:

G   <-- master, origin/master

Run Code Online (Sandbox Code Playgroud)

Those parent IDs are still actually inside G; it's just that every time we have Git use or show us the history, it immediately "grafts" nothing-at-all on, so that G seems to be a root commit, for history tracking purposes.

Updating a shallow clone we made earlier

But what if we already have a (depth-1 shallow) clone, and we want to update it? Well, that's not really a problem. Let's say we made a shallow clone of the upstream back when master pointed to commit B, before the new branches and the bug fix. That means we currently have this:

B   <-- master, origin/master

Run Code Online (Sandbox Code Playgroud)

While B's real parent is A, we have a shallow-clone graft entry saying "pretend B is a root commit". Now we git fetch --depth 1, which looks up the upstream's master—the thing we call origin/master—and sees commit G. We grab commit G from the upstream, along with its objects, but deliberately don't grab commits D and F. We then update our shallow-clone graft entries to say "pretend G is a root commit too":

B   <-- master

G   <-- origin/master

Run Code Online (Sandbox Code Playgroud)

Our repository now has two root commits: The name master (still) points to commit B, whose parents we (still) pretend are non-existent, and the name origin/master points to G, whose parents we pretend are non-existent.

This is why you need `git reset`

In a normal repository, you might use git pull, which really is git fetch followed by git merge. But git merge requires history, and we have none: we have faked Git out with pretend root commits, and they have no history behind them. So we must use git reset instead.

What git reset does is a bit complicated, because it can affect up to three different things: a branch name, the index, and the work-tree. We have already seen what the branch names are: they simply point to a (one, specific) commit, which we call the tip of the branch. That leaves the index and work-tree.

The work-tree is easy to explain: it's where all your files are. That's it: no more and no less. It's there so that you can actually use Git: Git is all about storing every commit ever made, forever, so that they can all be retrieved. But they're in a format useless to mere mortals. To be used, a file—or more typically, a whole commit's worth of files—has to be extracted into its normal format. The work-tree is where that happens, and then you can work on it and make new commits using it too.

The index is a bit harder to explain. It's something peculiar to Git: other version control systems don't have one, or if they have something like it, they don't expose it. Git does. Git's index is essentially where you keep the next commit to make, but that means that it starts out holding the current commit that you have extracted into the work-tree, and Git uses that to make Git fast. We'll say more about this in a bit.

What git reset --hard does is to affect all three: branch name, index, and work-tree. It moves the branch name so that it points to a (probably different) commit. Then it updates the index to match that commit, and updates the work-tree to match the new index.

Hence:

git reset --hard origin/master

Run Code Online (Sandbox Code Playgroud)

tells Git to look up origin/master. Since we ran our git fetch, that now points to commit G. Git then makes our master—our current (and only) branch—also point to commit G, and then updates our index and work-tree. Our graph now looks like this:

B   [abandoned - but see below]

G   <-- master, origin/master

Run Code Online (Sandbox Code Playgroud)

Now master and origin/master both name commit G, and commit G is the one checked-out into the work-tree.

Why you need `git clean -dfx`

The answer here is a bit complicated, but usually it's "you don't" (need to git clean).

When you do need git clean, it is because you—or something you ran—added files to your work-tree that you have not told Git about. These are untracked and/or ignored files. Using git clean -df will remove untracked files (and empty directories); adding -x will also remove the ignored files.

For more about the difference between "untracked" and "ignored", see this answer.

Why you don't need `git clean`: the index

I mentioned above that you usually don't need to run git clean. This is because of the index. As I said earlier, Git's index is mainly "the next commit to make". If you never add your own files—if you are just using git checkout to check out various existing commits that you have had all along, or that you have added with git fetch; or if you are using git reset --hard to move a branch name and also switch the index and work-tree to another commit—then whatever is in the index right now is there because an earlier git checkout (or git reset) put it in the index, and also into the work-tree.

In other words, the index has a short—and fast for Git to access—summary or manifest describing the current work-tree. Git uses that to know what is in the work-tree now. When you ask Git to switch to another commit, via git checkout or git reset --hard, Git can quickly compare the existing index to the new commit. Any files that have changed, Git must extract from the new commit (and update the index). Any files that are newly added, Git must also extract (and update the index). Any files that are gone—that are in the existing index, but not in the new commit—Git must remove ... and that's what Git does. Git updates, adds, and removes those files in the work-tree, as directed by the comparison between the current index, and the new commit.

What this means is that if you do need git clean, you must have done something outside Git that added files. These added files are not in the index, so by definition, they are untracked and/or ignored. If they are merely untracked, git clean -f would remove them, but if they are ignored, only git clean -fx will remove them. (You want -d just to remove directories that are or become empty during the cleaning.)

Abandoned commits and garbage collection

I mentioned, and drew in the updated shallow graph, that when we git fetch --depth 1 and then git reset --hard, we wind up abandoning the previous depth-1 shallow graph commit. (In the graph I drew, this was commit B.) However, in Git, abandoned commits are rarely truly abandoned—at least, not right away. Instead, some special names like ORIG_HEAD hang on to them for a while, and each reference—branches and tags are forms of reference—carries with it a log of "previous values".

You can display each reflog with git reflog refname. For instance, git reflog master shows you not only which commit master names now, but also which commits it has named in the past. There is also a reflog for HEAD itself, which is what git reflog shows by default.

Reflog entries eventually expire. Their exact duration varies, but by default they are eligible for expiration after 30 days in some cases and 90 days in others. Once they do expire, those reflog entries no longer protect abandoned commits (or, for annotated tag references, the annotated tag object—tags are not supposed to move, so this case is not supposed to occur, but if it does—if you force Git to move a tag—it's just handled in the same way as all other references).

Once any Git object—commit, annotated tag, "tree", or "blob" (file)—is really unreferenced, Git is allowed to remove it for real.² It's only at this point that the underlying repository data for the commits and files goes away. Even then, it only happens when something runs git gc. Thus, a shallow repository updated with git fetch --depth 1 is not quite the same as a fresh clone with --depth 1: the shallow repository probably has some lingering names for the original commits, and won't remove the extra repository objects until those names expire or are otherwise cleared-out.

²Besides the reference check, objects get a minimum time before they expire as well. The default is two weeks. This prevents git gc from deleting temporary objects that Git is creating, but has yet to establish a reference to. For instance, when making a new commit, Git first turns the index into a series of tree objects which refer to each other but have no top-level reference. Then it creates a new commit object that refers to the top-level tree, but nothing yet refers to the commit. Last, it updates the current branch name. Until that last step finishes, the trees and new commit are unreachable!

Special considerations for `--single-branch` and/or shallow clones

I noted above that the name you give to git clone -b can refer to a tag. For normal (non-shallow or non-single-branch) clones, this works just as one would expect: you get a regular clone, and then Git does a git checkout by the tag name. The result is the usual detached HEAD, in a perfectly ordinary clone.

With shallow or single-branch clones, however, there are several unusual consequences. These are all, to some extent, a result of Git letting the implementation show

很好的答案！您能否在顶部发布一个 tl;dr，突出显示为浅克隆大师运行的命令？ (3认同)

归档时间：	8 年，10 月前
查看次数：	8289 次
最近记录：	6 年，4 月前

如何更新git浅层克隆？

背景

问题

注意

这是怎么回事

Regular (non-shallow) clone and fetch

移植和替换

Making a shallow clone

Updating a shallow clone we made earlier

This is why you need git reset

Why you need git clean -dfx

Why you don't need git clean: the index

Abandoned commits and garbage collection

Special considerations for --single-branch and/or shallow clones

This is why you need `git reset`

Why you need `git clean -dfx`

Why you don't need `git clean`: the index

Special considerations for `--single-branch` and/or shallow clones