小编nsc*_*060的帖子

Spark - 读并写回同一 S3 位置

我正在从 S3 位置读取数据集 dataset1 和 dataset2。然后，我将它们转换并写回到读取 dataset2 的同一位置。

但是，我收到以下错误消息：

An error occurred while calling o118.save. No such file or directory 's3://<myPrefix>/part-00001-a123a120-7d11-581a-b9df-bc53076d57894-c000.snappy.parquet

Run Code Online (Sandbox Code Playgroud)

如果我尝试写入新的 S3 位置，例如，s3://dataset_new_path.../代码可以正常工作。

my_df \
  .write.mode('overwrite') \
  .format('parquet') \
  .save(s3_target_location)

Run Code Online (Sandbox Code Playgroud)

.cache()注意：我在读取数据帧后尝试使用，但仍然遇到相同的错误。

amazon-s3 apache-spark pyspark aws-glue

nsc*_*060

lucky-day

3
推荐指数

1
解决办法

4165
查看次数

Python - 返回元组列表

我正在尝试返回下面的元组列表

def lambda_handler(event, context):

    t1 = [('a', 'string', 'a', 'string'), ('b', 'string', 'b', 'string')]

    print(t1)
    return t1

Run Code Online (Sandbox Code Playgroud)

当我print列出列表或将列表转换为str->时print，我得到以下内容：

[('a', 'string', 'a', 'string'), ('b', 'string', 'b', 'string')]

Run Code Online (Sandbox Code Playgroud)

当我return得到

[
  [
    "a",
    "string",
    "a",
    "string"
  ],
  [
    "b",
    "string",
    "b",
    "string"
  ]
]

Run Code Online (Sandbox Code Playgroud)

我的挑战是我需要提交 AWS Glue 作业服务的元组列表：

ApplyMapping.apply(frame = datasource0, mappings = [("a", "string", "a", "string"), ("b", "string", "b", "string")], transformation_ctx = "applymapping1")

Run Code Online (Sandbox Code Playgroud)

但我传递了 (MY_LIST_PARAM) 这是命令的输出return：

ApplyMapping.apply(frame = datasource0, mappings = MY_LIST_PARAM, …

Run Code Online (Sandbox Code Playgroud)

python python-3.x

nsc*_*060

2020 02-19

2
推荐指数

1
解决办法

1348
查看次数

标签统计

amazon-s3 ×1

apache-spark ×1

aws-glue ×1

pyspark ×1

python ×1

python-3.x ×1

Spark - 读并写回同一 S3 位置

Python - 返回元组列表

标签 统计

小编nsc_060的帖子

标签统计