如何使用“gsutil”复制文件夹?

Pra*_*tic 7 google-cloud-storage gsutil google-cloud-platform

我已经阅读了有关该gsutil cp命令的文档,但仍然不明白如何复制文件夹以保持相同的权限。我试过这个命令:

gsutil cp gs://bucket-name/folder1/folder_to_copy gs://bucket-name/folder1/new_folder
Run Code Online (Sandbox Code Playgroud)

但它导致错误:

CommandException: No URLs matched
Run Code Online (Sandbox Code Playgroud)

虽然,当我在每个名称的末尾使用斜杠尝试它时,它没有显示任何错误:

gsutil cp gs://bucket-name/folder1/folder_to_copy/ gs://bucket-name/folder1/new_folder/
Run Code Online (Sandbox Code Playgroud)

但是,当我检查gsutil ls. 我究竟做错了什么?

Max*_*xim 15

您应该使用该-r选项以递归方式复制文件夹及其内容:

gsutil cp -r gs://bucket-name/folder1/folder_to_copy gs://bucket-name/folder1/new_folder
Run Code Online (Sandbox Code Playgroud)

请注意,这仅在folder_to_copy包含文件时才有效。这是由于云存储确实没有“文件夹”正如人们所期望在一个典型的图形用户界面,它,而不是提供一个分层文件树中的“平”的名字空间之上的错觉,如解释在这里。换句话说,文件夹中的文件只是附加了文件夹前缀的对象。因此,当您执行 时gsutil cp,它希望复制实际对象,而不是 CLI 不理解的空目录。

另一种方法是简单地使用rsync,它允许使用空文件夹并同步源文件夹和目标文件夹之间的内容:

gsutil rsync -r gs://bucket-name/folder1/folder_to_copy gs://bucket-name/folder1/new_folder
Run Code Online (Sandbox Code Playgroud)

如果您还想保留对象的 ACL(权限),请使用以下-p选项:

gsutil rsync -p -r gs://bucket-name/folder1/folder_to_copy gs://bucket-name/folder1/new_folder
Run Code Online (Sandbox Code Playgroud)


Fra*_*ois 11

要添加@Maxim的答案,您可以考虑-m在调用时使用参数gsutil以允许并行复制。

gsutil -m cp -r gs://bucket-name/folder1/folder_to_copy gs://bucket-name/folder1/new_folder
Run Code Online (Sandbox Code Playgroud)

arg-m启用并行性。

正如gsutil文档中所建议的,-marg 可能不会在网络速度较慢的情况下(在家里)产生更好的性能。但对于存储桶间复制(数据中心中的计算机)的情况,引用 gsutil 手册,性能可能会“显着提高”。见下文

 -m          Causes supported operations (acl ch, acl set, cp, mv, rm, rsync,
              and setmeta) to run in parallel. This can significantly improve
              performance if you are performing operations on a large number of
              files over a reasonably fast network connection.

              gsutil performs the specified operation using a combination of
              multi-threading and multi-processing, using a number of threads
              and processors determined by the parallel_thread_count and
              parallel_process_count values set in the boto configuration
              file. You might want to experiment with these values, as the
              best values can vary based on a number of factors, including
              network speed, number of CPUs, and available memory.

              Using the -m option may make your performance worse if you
              are using a slower network, such as the typical network speeds
              offered by non-business home network plans. It can also make
              your performance worse for cases that perform all operations
              locally (e.g., gsutil rsync, where both source and destination
              URLs are on the local disk), because it can "thrash" your local
              disk.

              If a download or upload operation using parallel transfer fails
              before the entire transfer is complete (e.g. failing after 300 of
              1000 files have been transferred), you will need to restart the
              entire transfer.

              Also, although most commands will normally fail upon encountering
              an error when the -m flag is disabled, all commands will
              continue to try all operations when -m is enabled with multiple
              threads or processes, and the number of failed operations (if any)
              will be reported as an exception at the end of the command's
              execution.
Run Code Online (Sandbox Code Playgroud)

注意:在撰写本文时,python3.8 似乎导致标志出现问题-m。使用python3.7。有关此Github 问题的更多信息