What should I do about this gsutil "parallel composite upload" warning?

use*_*204 8 python gsutil

I am running a python script and using the os library to execute a gsutil command, which is typically executed in the command prompt on Windows. I have some file on my local computer and I want to put it into a Google Bucket so I do:

import os

command = 'gsutil -m cp myfile.csv  gs://my/bucket/myfile.csv'
os.system(command)
Run Code Online (Sandbox Code Playgroud)

I get a message like:

==> NOTE: You are uploading one or more large file(s), which would run significantly faster if you enable parallel composite uploads. This feature can be enabled by editing the "parallel_composite_upload_threshold" value in your .boto configuration file. However, note that if you do this large files will be uploaded as 'composite objects https://cloud.google.com/storage/docs/composite-objects'_, which means that any user who downloads such objects will need to have a compiled crcmod installed (see "gsutil help crcmod"). This is because without a compiled crcmod, computing checksums on composite objects is so slow that gsutil disables downloads of composite objects.

I want to get rid of this message either by hiding it if it's irrelevant od actually doing what it suggests, but I can't find the .boto file. What should I do?

Cha*_*ffy 18

文档的Parallel Composite Uploads部分gsutil描述了如何解决此问题(假设,正如警告所指定的,该内容将由具有crcmod可用模块的客户端使用):

gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket
Run Code Online (Sandbox Code Playgroud)

要从 Python 安全地执行此操作,将如下所示:

filename='myfile.csv'
gs_bucket='my/bucket'
parallel_threshold='150M' # minimum size for parallel upload; 0 to disable

subprocess.check_call([
  'gsutil',
  '-o', 'GSUtil:parallel_composite_upload_threshold=%s' % (parallel_threshold,),
  'cp', filename, 'gs://%s/%s' % (gs_bucket, filename)
])
Run Code Online (Sandbox Code Playgroud)

请注意,在这里您明确提供了参数向量边界,而不是依赖 shell 来为您执行此操作;这可以防止恶意或错误的文件名执行不需要的操作。


如果您不知道访问此存储桶中内容的客户端将具有该crcmod模块,请考虑parallel_threshold='0'上面的设置,这将禁用此支持。


fab*_*ioM 7

另一种方法是在BOTO_PATH. 通常$HOME/.boto

[GSUtil]
parallel_composite_upload_threshold = 150M
Run Code Online (Sandbox Code Playgroud)

为了获得最大速度,安装crcmodC 库