在Spark JDBC中使用谓词读取

Question

在Spark JDBC中使用谓词读取

ds_*_*ser 3 hadoop scala jdbc intellij-idea apache-spark

我正在将数据从sql服务器拉到hdfs。这是我的摘录，

val predicates = Array[String]("int_id < 500000", "int_id >= 500000 && int_id < 1000000")

  val jdbcDF = spark.read.format("jdbc")
      .option("url", dbUrl)
      .option("databaseName", "DatabaseName")
      .option("dbtable", table)
      .option("user", "***")
      .option("password", "***")
      .option("predicates", predicates)
      .load()

Run Code Online (Sandbox Code Playgroud)

我的Intellij IDE一直在说

“类型不匹配，预期为布尔值或长整型或双精度或字符串，实际：Array [String]”

在谓词中。不知道这怎么了。谁能看到这有什么问题吗？另外，我如何在这里使用提取大小？

谢谢。

Answer 1

ste*_*ino 5

To option方法仅接受Booleans，Longs，Doubles或Strings。要通过predicates为Array[String]您必须使用jdbc，而不是在其指定的方法format方法。

val predicates = Array[String]("int_id < 500000", "int_id >= 500000 && int_id < 1000000")

val jdbcDF = spark.read.jdbc(
  url = dbUrl,
  table = table,
  predicates = predicates,
  connectionProperties = new Properties(???) // user, pass, db, etc.
)

Run Code Online (Sandbox Code Playgroud)

您可以在此处查看示例。

归档时间：	8 年前
查看次数：	2445 次
最近记录：	8 年前