在恢复时,gsutil似乎重新上传文件

Question

在恢复时,gsutil似乎重新上传文件

MPB*_*all 6 google-cloud-storage google-compute-engine

我正在尝试将数据从一个磁盘上传到Google云端存储,其中包含大约1TB的~3000个文件.我正在使用gsutil cp -R <disk-top-directory> <bucket>.我的理解是,如果gsutil恢复/重新启动,它会使用校验和来确定文件何时上传并跳过它.

它似乎没有这样做:它似乎是从顶部恢复上传并重新替换文件.当我连续gsutil ls -Rl <bucket/disk-top-directory>十分钟运行并与它们进行比较时diff,我看到的是具有相同尺寸但更改(更新)日期的相同文件.(即与重新上传的同一文件一致.)

例如:

<  404104811  2014-04-08T14:13:44Z  gs://my-bucket/disk-top-directory/dir1/dir2/dir3/dir4/dir5/file-20.tsv.bz2
---
>  404104811  2014-04-08T14:43:48Z  gs://my-bucket/disk-top-directory/dir1/dir2/dir3/dir4/dir5/file-20.tsv.bz2

Run Code Online (Sandbox Code Playgroud)

我用来读取磁盘和传输文件的机器正在运行Ubuntu 13.10.我使用Debian和Ubuntu的pip指令安装了gsutil.

我误解了gsutil的可恢复转移是如何工作的吗？如果没有,任何诊断和修复,以获得正确的恢复行为？提前致谢!

Answer 1

Ian*_*GSY 5

您需要使用-n(No-clobber)开关来防止重新上载目标中已存在的对象.

gsutil cp -Rn <disk-top-directory> <bucket>

Run Code Online (Sandbox Code Playgroud)

从帮助(gsutil help cp)

-n            No-clobber. When specified, existing files or objects at the
              destination will not be overwritten. Any items that are skipped
              by this option will be reported as being skipped. This option
              will perform an additional HEAD request to check if an item
              exists before attempting to upload the data. This will save
              retransmitting data, but the additional HTTP requests may make
              small object transfers slower and more expensive.

Run Code Online (Sandbox Code Playgroud)

同样根据这一点,当传输超过2MB的文件时,gsutil会自动使用可恢复的传输模式.

归档时间：	11 年，9 月前
查看次数：	1130 次
最近记录：	10 年，4 月前