如何使用 Databricks 将 CSV 写回 Azure Blob 存储？

Question

如何使用 Databricks 将 CSV 写回 Azure Blob 存储？

sy-*_*uss 5 scala pandas databricks azure-databricks

我正在努力写回 Azure Blob 存储容器。我可以使用以下内容从容器中读取内容：

storage_account_name = "expstorage"
storage_account_key = "1VP89J..."
container = "source"

spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), storage_account_key)

dbutils.fs.ls("dbfs:/mnt/azurestorage")

Run Code Online (Sandbox Code Playgroud)

我尝试了多种方法来写回我的容器，只是进行搜索，但我找不到确定的方法。

这是使用 SAS 密钥的替代方案的链接，但我不想混合/匹配密钥类型。

使用 azure databricks 将数据帧写入 blob

Answer 1

Axe*_* R. 8

为了写入 Blob 存储，您只需指定路径，以以下开头dbfs:/mnt/azurestorage：

df.write
 .mode("overwrite")
 .option("header", "true")
 .csv("dbfs:/mnt/azurestorage/filename.csv"))

Run Code Online (Sandbox Code Playgroud)

这将创建一个包含分布式数据的文件夹。如果您正在寻找单个 csv 文件，请尝试以下操作：

df.toPandas().to_csv("dbfs:/mnt/azurestorage/filename.csv")

Run Code Online (Sandbox Code Playgroud)

如果您仅使用 pandas，您将无法访问 dbfs api，因此您需要使用本地文件 API，这意味着您的路径必须以/dbfs/以下dbfs:/形式开头：

df.to_csv(r'/dbfs/mnt/azurestorage/filename.csv', index = False)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，4 月前
查看次数：	6797 次
最近记录：	2 年，2 月前