Rec*_*tan 6 python python-3.x pandas azure-blob-storage
我正在尝试使用 python 从 blob 存储读取多个 CSV 文件。
我正在使用的代码是:
blob_service_client = BlobServiceClient.from_connection_string(connection_str)
container_client = blob_service_client.get_container_client(container)
blobs_list = container_client.list_blobs(folder_root)
for blob in blobs_list:
blob_client = blob_service_client.get_blob_client(container=container, blob="blob.name")
stream = blob_client.download_blob().content_as_text()
Run Code Online (Sandbox Code Playgroud)
我不确定存储在 pandas 数据框中读取的 CSV 文件的正确方法是什么。
我尝试使用:
df = df.append(pd.read_csv(StringIO(stream)))
Run Code Online (Sandbox Code Playgroud)
但这向我显示了一个错误。
知道我该怎么做吗?
小智 9
import pandas as pd
data = pd.read_csv('blob_sas_url')
Run Code Online (Sandbox Code Playgroud)
通过右键单击要导入的 Azure 门户的 Blob 文件并选择“生成 SAS”,可以找到 Blob SAS Url。然后,单击“生成 SAS 令牌和 URL”按钮,并将 SAS url 复制到上面的代码中代替 blob_sas_url。
您可以从 blob 存储下载该文件,然后从下载的文件中将数据读入 pandas DataFrame。
from azure.storage.blob import BlockBlobService
import pandas as pd
import tables
STORAGEACCOUNTNAME= <storage_account_name>
STORAGEACCOUNTKEY= <storage_account_key>
LOCALFILENAME= <local_file_name>
CONTAINERNAME= <container_name>
BLOBNAME= <blob_name>
#download from blob
t1=time.time()
blob_service=BlockBlobService(account_name=STORAGEACCOUNTNAME,account_key=STORAGEACCOUNTKEY)
blob_service.get_blob_to_path(CONTAINERNAME,BLOBNAME,LOCALFILENAME)
t2=time.time()
print(("It takes %s seconds to download "+blobname) % (t2 - t1))
# LOCALFILE is the file path
dataframe_blobdata = pd.read_csv(LOCALFILENAME)
Run Code Online (Sandbox Code Playgroud)
有关更多详细信息,请参阅此处。
如果您想直接进行转换,代码会有所帮助。您需要从 blob 对象获取内容,并且get_blob_to_text
不需要本地文件名。
from io import StringIO
blobstring = blob_service.get_blob_to_text(CONTAINERNAME,BLOBNAME).content
df = pd.read_csv(StringIO(blobstring))
Run Code Online (Sandbox Code Playgroud)
基于 @sahaj-raj-malla答案:2 个从 blob 加载(或保存)文件的代码片段:
pip install adlfs fsspec
]import pandas as pd
account_name = "my_account_stage_name"
account_key = "loooooooooooooooooooooong_acccccooooooooount_keeeeeeeeeeeeeeeeey$$$$***$$$$$$$$$$$$$$22222222"
connection_string = f"DefaultEndpointsProtocol=https;AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net"
pd.read_csv("abfs:///my_container_name/path/to/my/file/on/blob/file.csv", storage_options={"account_name": account_name, "connection_string": connection_string})
Run Code Online (Sandbox Code Playgroud)
pip install azure-storage-blob
]from azure.storage.blob import BlobServiceClient
import pandas as pd
account_name = "my_account_stage_name"
account_key = "loooooooooooooooooooooong_acccccooooooooount_keeeeeeeeeeeeeeeeey$$$$***$$$$$$$$$$$$$$22222222"
connection_string = f"DefaultEndpointsProtocol=https;AccountName={account_name};AccountKey={account_key};EndpointSuffix=core.windows.net"
# load file from blob
container_name = "my_container_name"
blob_name = "path/to/my/file/on/blob/file.csv"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(blob_name)
# load to RAM, eg. jupyter notebook
pd.read_csv(blob_client.download_blob())
# save file to ROM, eg. local file
local_file_name = "path/to/my/file/on/disk/file.csv"
with open(local_file_name, "wb") as my_blob_locally:
download_stream = blob_client.download_blob()
my_blob_locally.write(download_stream.readall())
Run Code Online (Sandbox Code Playgroud)
如何获取连接字符串
归档时间: |
|
查看次数: |
34493 次 |
最近记录: |