使用 Azure Synapse Analytics 笔记本将数据写入 Azure Data Lake Storage Gen 2

pao*_*one 6 apache-spark pyspark azure-synapse

我使用 Azure Synapse Analytics 笔记本连接到 RESTful api,并将 json 文件写入 Azure Data Lake Storage Gen 2。

pyspark代码:

import requests
response = requests.get('https://api.web.com/v1/data.json')
data = response.json()
from pyspark.sql import *
df = spark.read.json(sc.parallelize([data]))
from pyspark.sql.types import *
account_name = "name of account"
container_name = "name of container"
relative_path = "name of file path"    #abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>
adls_path = 'abfss://%s@%s.dfs.core.windows.net/%s' % (container_name, account_name, relative_path)
spark.conf.set('fs.%s@%s.dfs.core.windows.net/%s' % (container_name, account_name), "account_key") #not sure I'm doing the configuration right
df.write.mode("overwrite").json(adls_path)
Run Code Online (Sandbox Code Playgroud)

错误:

Py4JJavaError : An error occurred while calling o536.json.
: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://storageaccount.dfs.core.windows.net/container/?upn=false&action=getAccessControl&timeout=90
Run Code Online (Sandbox Code Playgroud)

CHE*_*SFT 3

注意:Storage Blob Data Contributor:用于授予对 Blob 存储资源的读/写/删除权限。

如果不将 Storage Blob Data Contributor 分配给正在访问存储帐户的用户,则由于缺乏存储帐户的权限,他们将无法访问 ADLS gen2 中的数据。

Storage Blob Data Contributor如果他们尝试在存储帐户上没有“ ”角色的情况下尝试从 ADLS gen2 访问数据,他们将收到错误消息:Operation failed: "This request is not authorized to perform this operation.",403.

创建存储帐户后,从左侧导航中选择访问控制 (IAM)。然后分配以下角色或确保它们已被分配。将您自己分配给存储帐户上的存储 Blob 数据所有者角色。

Storage Blob Data Contributor在存储帐户上授予角色后,等待5-10几分钟,然后重试该操作。

在此输入图像描述