无法上传> ~2GB到Google云端存储

sev*_*ian 5 python google-cloud-storage

跟踪下面.

相关的Python代码段:

bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)
Run Code Online (Sandbox Code Playgroud)

最终触发(来自ssl库):

OverflowError:字符串长度超过2147483647个字节

我假设有一些特殊配置选项我不见了?

这可能与这个~1.5岁的显然尚未解决的问题有关:https://github.com/googledatalab/datalab/issues/784.

帮助赞赏!

完整跟踪:

[文件"/usr/src/app/gcloud/download_data.py",第109行,在*******blob.upload_from_filename(source_path)

在upload_from_filename size = total_bytes中输入文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",第992行

在upload_from_file客户端,file_obj,content_type,size,num_retries中输入文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",第946行

文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",第867行,在_do_upload客户端,流,content_type,size,num_retries)

文件"/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py",第700行,_do_multipart_upload传输,data,object_metadata,content_type)

文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py",第97行,发送时retry_strategy = self._retry_strategy)

在http_request func,RequestsMixin._get_status_code,retry_strategy中输入文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py",第101行

在wait_and_retry response = func()中输入文件"/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py",第146行

文件"/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py",第186行,请求方法,url,data = data,headers = request_headers,**kwargs)

文件"/usr/local/lib/python3.5/dist-packages/requests/sessions.py",第508行,请求resp = self.send(prep,**send_kwargs)

文件"/usr/local/lib/python3.5/dist-packages/requests/sessions.py",第618行,发送r = adapter.send(请求,**kwargs)

文件"/usr/local/lib/python3.5/dist-packages/requests/adapters.py",第440行,发送超时=超时

文件"/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py",第601行,在urlopen chunked = chunked中)

文件"/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py",第357行,在_make_request conn.request(method,url,**httplib_request_kw)中

文件"/usr/lib/python3.5/http/client.py",第1106行,请求self._send_request(方法,url,正文,标题)

文件"/usr/lib/python3.5/http/client.py",第1151行,在_send_request self.endheaders(body)中

文件"/usr/lib/python3.5/http/client.py",第1102行,在endheaders中self._send_output(message_body)

在_send_output self.send(message_body)中输入文件"/usr/lib/python3.5/http/client.py",第936行

在send self.sock.sendall(data)中输入文件"/usr/lib/python3.5/http/client.py",第908行

在sendall v = self.send(data [count:])中输入文件"/usr/lib/python3.5/ssl.py",第891行

文件"/usr/lib/python3.5/ssl.py",第861行,发送返回self._sslobj.write(data)

文件"/usr/lib/python3.5/ssl.py",第586行,写入返回self._sslobj.write(data)

OverflowError:字符串长度超过2147483647个字节

jko*_*ker 7

问题是它试图将整个文件读入内存.链接之后upload_from_filename显示stats文件,然后将其作为上传大小作为单个上传部分传递.

相反,指定chunk_size创建对象的时间将触发它在多个部分上传:

# Must be a multiple of 256KB per docstring    
CHUNK_SIZE = 10485760  # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)
Run Code Online (Sandbox Code Playgroud)

快乐黑客!