小编Moh*_*B C的帖子

Spark 错误:-“foreach 的值不是对象的成员”

数据框由两列(s3ObjectName,batchName)组成,其中包含数万行,例如:-

s3对象名称 批次名称
a1.json 45
b2.json 45
c3.json 45
d4.json 46
e5.json 46

目标是使用 foreachPartition() 和 foreach() 函数从 S3 存储桶中检索对象并使用数据帧中每行的详细信息并行写入数据湖

  // s3 connector details defined as an object so it can be serialized and available on all executors in the cluster

object container {
  
  def getDataSource() = {
    val AccessKey = dbutils.secrets.get(scope = "ADBTEL_Scope", key = "Telematics-TrueMotion-AccessKey-ID")
    val SecretKey = dbutils.secrets.get(scope = "ADBTEL_Scope", key = "Telematics-TrueMotion-AccessKey-Secret")
    val creds = new BasicAWSCredentials(AccessKey, SecretKey)
    val clientRegion: Regions = Regions.US_EAST_1
    AmazonS3ClientBuilder.standard()
    .withRegion(clientRegion)
    .withCredentials(new …
Run Code Online (Sandbox Code Playgroud)

foreach scala apache-spark databricks

2
推荐指数
1
解决办法
3268
查看次数

标签 统计

apache-spark ×1

databricks ×1

foreach ×1

scala ×1