使用 GitHub API 在浅层克隆中获取特定的远程提交(及其深度)?

sda*_*aau 5 git github

比方说,我需要在修订/提交哈希 id 处获取https://github.com/mozilla/gecko-dev042b84a。现在,整个存储库是(在克隆之前查看 github 存储库的大小?):

wget -qO- https://api.github.com/repos/mozilla/gecko-dev | grep size
#  "size": 3891062,  # this in kB
Run Code Online (Sandbox Code Playgroud)

......这对我来说有点太多了。所以,我想,我会得到一个浅层克隆 - 仅此一项就可以获得近 400 MB:

git clone --depth 1 https://github.com/mozilla/gecko-dev
# remote: Counting objects: 231302, done.
# Receiving objects: 100% (231302/231302), 392.95 MiB
Run Code Online (Sandbox Code Playgroud)

现在,这克隆了 HEAD,我不能从这里直接到达 042b84a,尤其是我使用的 git 1.9.1 版客户端(如何浅克隆深度为 1 的特定提交?我克隆 repo 后如何获取远程分支)使用 git clone --depth 1Git:获取深度为 1 的 git 存储库的特定修订版)。显然,除了不浅化 repo(无论如何下载与完整克隆相同)之外,我能做的最好的事情就是慢慢增加深度。

我不确定“深度”是否仅对应于 HEAD 和给定修订版之间的提交次数 -在远程分支上获取 git sha depth注意到对于完整克隆,您可以执行以下操作:

git rev-list HEAD ^042b84a --count
Run Code Online (Sandbox Code Playgroud)

...所以,这意味着“深度”确实是 HEAD 和给定修订版之间的提交次数 - 但是,没有明显的方法可以从 git 中的远程存储库查询它。

因此,在进行完整的克隆/深度增加之前,找到相对于当前 HEAD 所需的 042b84a 的深度会很酷;我想也许从命令行使用 GitHub API 会有所帮助,因为它是从 GitHub 托管的。所以我试过:

cd gecko-dev

wget -qO- https://api.github.com/repos/mozilla/gecko-dev/commits/042b84a | grep date
#      "date": "2017-04-27T07:18:07Z"

curl -sI 'https://api.github.com/repos/mozilla/gecko-dev/commits?sha=042b84a' | grep last
# Link: <https://api.github.com/repositories/13509108/commits?sha=042b84a&page=2>; rel="next", <https://api.github.com/repositories/13509108/commits?sha=042b84a&page=17756>; rel="last"

wget -qO- 'https://api.github.com/repos/mozilla/gecko-dev/commits?sha=042b84a&page=17756' | grep '^    "sha"' | wc -l
# 5
Run Code Online (Sandbox Code Playgroud)

由于参数sha是“ SHA or branch to start listing commits from ”,而GitHub API“ a call to list GitHub's public repositories 提供了以30为一组的分页项目”,这里我们有17756页,其中第17756页有5个结果;- 所以,我们在 042b84a 和 HEAD 之间有 17755*30+5 = 532655 次提交?

所以,那么我做 - 但是:

git fetch --progress --depth=532655
# error: RPC failed; result=18, HTTP code = 200
# fatal: The remote end hung up unexpectedly
Run Code Online (Sandbox Code Playgroud)

...呼叫失败。

是否有可能使用 git 客户端 1.9 以某种方式扩展此浅层克隆以包含修订版 042b84a,而无需克隆所有 4GB 数据——通过使用 GitHub API 提供的一些存储库数据?


编辑:有这个地方,但仍然没有明确的答案。首先,从现在(2018 年 1 月)到 2017 年 4 月提交之间的距离,532655 的深度是可疑的。因此,我尝试查找自日期以来的提交:

curl -sI 'https://api.github.com/repos/mozilla/gecko-dev/commits?since=2017-04-27T07:18:07Z' | grep last
# Link: <https://api.github.com/repositories/13509108/commits?since=2017-04-27T07%3A18%3A07Z&page=2>; rel="next", <https://api.github.com/repositories/13509108/commits?since=2017-04-27T07%3A18%3A07Z&page=1267>; rel="last"
wget -qO- 'https://api.github.com/repos/mozilla/gecko-dev/commits?since=2017-04-27T07:18:07Z&page=1267' | grep '^    "sha"' | wc -l
# 18
wcalc 1266*30+18
# = 37998
git fetch -v --progress --depth=37998
# POST git-upload-pack (419 bytes)
# error: RPC failed; result=18, HTTP code = 200
# fatal: The remote end hung up unexpectedly
Run Code Online (Sandbox Code Playgroud)

因此,从日期开始查看,我们得到 37998 次提交或深度,但即使是该调用也无法获取。

因此,知道提交数至少以千计,我尝试慢慢增加:

git fetch -vvvv --progress --depth=1000 origin
# remote: Counting objects: 53595, done.
# remote: Compressing objects: 100% (24434/24434), done.
# remote: Total 53595 (delta 43532), reused 36280 (delta 28120), pack-reused 0
# Receiving objects: 100% (53595/53595), 16.14 MiB | 409.00 KiB/s, done.
# Resolving deltas: 100% (43532/43532), completed with 10563 local objects.
# From https://github.com/mozilla/gecko-dev
#  = [up to date]      master     -> origin/master
git log --oneline | wc -l
# 7492

git fetch -vvvv --progress --depth=2000 origin
# remote: Counting objects: 140804, done.
# remote: Compressing objects: 100% (54300/54300), done.
# Receiving objects: 100% (140804/140804), 57.13 MiB | 404.00 KiB/s, done.
# remote: Total 140804 (delta 114158), reused 106827 (delta 84436), pack-reused 0
# Resolving deltas: 100% (114158/114158), completed with 20700 local objects.
# From https://github.com/mozilla/gecko-dev
#  = [up to date]      master     -> origin/master
git log --oneline | wc -l
# 18137
Run Code Online (Sandbox Code Playgroud)

...最后在一个循环中:

i=2000; until git show 042b84a; do i=$((i+1000)); echo "depth $i"; git fetch --depth=$i ; done
# fatal: ambiguous argument '042b84a': unknown revision or path not in the working tree.
# Use '--' to separate paths from revisions, like this:
# 'git <command> [<revision>...] -- [<file>...]'
# depth 3000
# remote: Counting objects: 136434, done.
# remote: Compressing objects: 100% (47014/47014), done.
# remote: Total 136434 (delta 108858), reused 110481 (delta 86139), pack-reused 0
# Receiving objects: 100% (136434/136434), 71.36 MiB | 403.00 KiB/s, done.
# Resolving deltas: 100% (108858/108858), completed with 13997 local objects.
# fatal: ambiguous argument '042b84a': unknown revision or path not in the working tree.
# Use '--' to separate paths from revisions, like this:
# 'git <command> [<revision>...] -- [<file>...]'
# depth 4000
# remote: Counting objects: 240103, done.
# remote: Compressing objects: 100% (77811/77811), done.
# remote: Total 240103 (delta 196215), reused 195977 (delta 157920), pack-reused 0
# Receiving objects: 100% (240103/240103), 117.71 MiB | 404.00 KiB/s, done.
# Resolving deltas: 100% (196215/196215), completed with 23725 local objects.
# commit 042b84af6020b1f2d8029a0dc36ac5955b7f325f [...]
git log --oneline | wc -l
# 50871
git rev-list HEAD ^042b84a --count
# 45283
Run Code Online (Sandbox Code Playgroud)

(根据对象数量、下载大小等的增加情况来判断,在这种情况下,--depth=1000已经获取一个似乎无关紧要- 在发出 fetch 后--depth=2000,所有以前的对象都将被重新下载?)

所以,042b84a当我们这样做时,提交终于出现了git fetch --depth 4000- 很明显这次提交的深度是 3000 < depth <= 4000 ?,在那个深度我们可以计算 50871 个日志条目(提交?),而git rev-list HEAD ^042b84a --count报告 45283(也提交?)? !如果不是提交次数,那么“深度”是什么?