Bun*_*nyk 3 python google-cloud-storage google-cloud-platform
我看到过这个问题:How to read first 2 rows of csv from Google Cloud Storage
但就我而言,我不想将整个 csv blob 加载到内存中,因为它可能很大。有什么方法可以将其作为可迭代的(或类似文件的对象)打开,并且只读取前几行的字节?
希望通过在我们不知道 CSV 标头大小的情况下如何创建可迭代的示例来扩展 simzes 的答案。对于从数据存储中逐行读取 CSV 也很有用:
def get_csv_header(blob):
for line in csv.reader(blob_lines(blob)):
return line
# How much bytes of blob download using one request.
# Selected experimentally. If there is more optimal value for this - please update.
BLOB_CHUNK_SIZE = 2000
def blob_lines(blob: storage.blob.Blob) -> Generator[str, None, None]:
position = 0
buff = []
while True:
chunk = blob.download_as_string(start=position, end=position + BLOB_CHUNK_SIZE).decode()
if '\n' in chunk:
part1, part2 = chunk.split('\n', 1)
buff.append(part1)
yield ''.join(buff)
parts = part2.split('\n')
for part in parts[:-1]:
yield part
buff = [parts[-1]]
else:
buff.append(chunk)
position += BLOB_CHUNK_SIZE + 1 # Blob chunk is downloaded using closed interval
if len(chunk) < BLOB_CHUNK_SIZE:
yield ''.join(buff)
return
Run Code Online (Sandbox Code Playgroud)
小智 5
APIgoogle.cloud.storage.blob.Blob
指定该download_as_string
方法具有提供字节范围的关键字start
:end
编辑:
download_as_string
不推荐使用download_as_byte
归档时间: |
|
查看次数: |
4654 次 |
最近记录: |