小编Ger*_*os 的帖子

在 awsglue pyspark 中设置 Spark 配置

我将 AWS Glue 与 pySpark 结合使用,并希望在 SparkSession 中添加一些配置,例如'"spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"spark.hadoop.fs.s3a.multiobjectdelete.enable","false""spark.serializer", "org.apache.spark.serializer.KryoSerializer""spark.hadoop.fs.s3a.fast.upload","true"。我用来初始化上下文的代码如下:

glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session
Run Code Online (Sandbox Code Playgroud)

据我从文档中了解到,我应该在提交粘合作业时将这些配置添加为作业参数。是这样还是可以在初始化火花时添加它们?

configuration hadoop amazon-web-services apache-spark

6
推荐指数
1
解决办法
5943
查看次数

使用 python 将 CSV 转换为 AVRO

我有以下 csv :

field1;field2;field3;field4;field5;field6;field7;field8;field9;field10;field11;field12;
eu;4523;35353;01/09/1999; 741 ; 386 ; 412 ; 86 ; 1.624 ; 1.038 ; 469 ; 117 ;
Run Code Online (Sandbox Code Playgroud)

我想将其转换为 avro。我创建了以下 avro 架构:

{"namespace": "forecast.avro",
 "type": "record",
 "name": "forecast",
 "fields": [
     {"name": "field1", "type": "string"},
     {"name": "field2", "type": "string"},
     {"name": "field3", "type": "string"},
     {"name": "field4", "type": "string"},
     {"name": "field5", "type": "string"},
     {"name": "field6", "type": "string"},
     {"name": "field7", "type": "string"},
     {"name": "field8", "type": "string"},
     {"name": "field9", "type": "string"},
     {"name": "field10", "type": "string"},
     {"name": "field11", "type": "string"},
     {"name": "field12", "type": …
Run Code Online (Sandbox Code Playgroud)

python csv avro

3
推荐指数
1
解决办法
5359
查看次数