我是 python 和 pyspark 的新手。我想知道如何在 pyspark 中编写以下 spark 数据帧函数:
val df = spark.read.format("jdbc").options(
Map(
"url" -> "jdbc:someDB",
"user" -> "root",
"password" -> "password",
"dbtable" -> "tableName",
"driver" -> "someDriver")).load()
Run Code Online (Sandbox Code Playgroud)
我试着在pyspark中写如下。但是,得到语法错误:
df = spark.read.format("jdbc").options(
map(lambda : ("url","jdbc:someDB"), ("user","root"), ("password","password"), ("dbtable","tableName"), ("driver","someDriver"))).load()
Run Code Online (Sandbox Code Playgroud)
提前致谢
我有一个十进制和字符串类型的数据框。我想将所有十进制列转换为 double 而不命名它们。我试过这个没有成功。有点新的火花。
>df.printSchema
root
|-- var1: decimal(38,10) (nullable = true)
|-- var2: decimal(38,10) (nullable = true)
|-- var3: decimal(38,10) (nullable = true)
…
150 more decimal and string columns
Run Code Online (Sandbox Code Playgroud)
我尝试:
import org.apache.spark.sql.types._
val cols = df.columns.map(x => {
if (x.dataType == DecimalType(38,0)) col(x).cast(DoubleType)
else col(x)
})
Run Code Online (Sandbox Code Playgroud)
我得到
<console>:30: error: value dataType is not a member of String
if (x.dataType == DecimalType(38,0)) col(x).cast(DoubleType)
Run Code Online (Sandbox Code Playgroud)