如何将数据从谷歌云加载到 jupyter notebook VM？

Question

如何将数据从谷歌云加载到 jupyter notebook VM？

Mik*_*Sal 3 python-3.x google-cloud-storage jupyter-notebook

我正在尝试将存储在我的谷歌云上的一堆 csv 文件加载到我的 jupyter 笔记本中。我使用 python 3 并且gsutil不起作用。

假设我在“\bucket1\1”中有 6 个 .csv 文件。有人知道我应该做什么吗？

Answer 1

May*_*eru 5

您正在Google Cloud 虚拟机实例上运行Jupyter Notebook。并且您想将 6 个 .csv 文件（您当前在云存储上的文件）加载到其中。

安装依赖项：

pip install google-cloud-storage
pip install pandas

Run Code Online (Sandbox Code Playgroud)

在您的笔记本上运行以下脚本：

from google.cloud import storage
import pandas as pd

bucket_name = "my-bucket-name"

storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)

# When you have your files in a subfolder of the bucket.
my_prefix = "csv/" # the name of the subfolder
blobs = bucket.list_blobs(prefix = my_prefix, delimiter = '/')

for blob in blobs:
    if(blob.name != my_prefix): # ignoring the subfolder itself 
        file_name = blob.name.replace(my_prefix, "")
        blob.download_to_filename(file_name) # download the file to the machine
        df = pd.read_csv(file_name) # load the data
        print(df)

# When you have your files on the first level of your bucket

blobs = bucket.list_blobs()

for blob in blobs:
    file_name = blob.name
    blob.download_to_filename(file_name) # download the file to the machine
    df = pd.read_csv(file_name) # load the data
    print(df)

Run Code Online (Sandbox Code Playgroud)

笔记：

Pandas是在 Python 中处理数据分析时使用的一个很好的依赖项，因此它会让您更轻松地处理 csv 文件。
该代码包含 2 个备选方案：一个如果您在子文件夹中有对象，另一个如果您在第一级有对象，请使用适用于您的情况的那个。
代码循环遍历所有对象，因此如果其中有其他类型的对象，则可能会出错。
如果您在运行 Notebook 的机器上已经有了这些文件，那么您可以忽略 Google Cloud Storage 行，只需在“read_csv”方法上指定每个文件的根/相对路径。
有关列出 Cloud Storage 对象的更多信息，请访问此处，下载 Cloud Storage 对象请访问此处。

归档时间：	6 年，8 月前
查看次数：	5388 次
最近记录：	5 年，5 月前