如何将路径列表传递给spark.read.load?

Tak*_*shi 3 scala apache-spark apache-spark-sql

我可以通过向load方法传递多个路径来一次加载多个文件,例如

spark.read
  .format("com.databricks.spark.avro")
  .load(
    "/data/src/entity1/2018-01-01",
    "/data/src/entity1/2018-01-12",
    "/data/src/entity1/2018-01-14")
Run Code Online (Sandbox Code Playgroud)

我想首先准备一个路径列表并将它们传递给load方法,但是我得到以下编译错误:

val paths = Seq(
  "/data/src/entity1/2018-01-01",
  "/data/src/entity1/2018-01-12",
  "/data/src/entity1/2018-01-14")
spark.read.format("com.databricks.spark.avro").load(paths)

<console>:29: error: overloaded method value load with alternatives:
  (paths: String*)org.apache.spark.sql.DataFrame <and>
  (path: String)org.apache.spark.sql.DataFrame
 cannot be applied to (List[String])spark.read.format("com.databricks.spark.avro").load(paths)
Run Code Online (Sandbox Code Playgroud)

为什么?如何将路径列表传递给load方法?

Ram*_*jan 8

你只需要一个splat operator(_*)paths列表为

spark.read.format("com.databricks.spark.avro").load(paths: _*)
Run Code Online (Sandbox Code Playgroud)