Rah*_*hul 5 json scala apache-spark apache-spark-sql
我有一个从csv读取的数据帧.
CSV:
name,age,pets
Alice,23,dog
Bob,30,dog
Charlie,35,
Reading this into a DataFrame called myData:
+-------+---+----+
| name|age|pets|
+-------+---+----+
| Alice| 23| dog|
| Bob| 30| dog|
|Charlie| 35|null|
+-------+---+----+
Run Code Online (Sandbox Code Playgroud)
现在,我想将此数据帧的每一行转换为json myData.toJSON.我得到的是以下jsons.
{"name":"Alice","age":"23","pets":"dog"}
{"name":"Bob","age":"30","pets":"dog"}
{"name":"Charlie","age":"35"}
Run Code Online (Sandbox Code Playgroud)
我希望第三行的json包含null值.防爆.
{"name":"Charlie","age":"35", "pets":null}
Run Code Online (Sandbox Code Playgroud)
但是,这似乎不可能.我通过代码调试并看到Spark的org.apache.spark.sql.catalyst.json.JacksonGenerator类具有以下实现
private def writeFields(
row: InternalRow, schema: StructType, fieldWriters:
Seq[ValueWriter]): Unit = {
var i = 0
while (i < row.numFields) {
val field = schema(i)
if (!row.isNullAt(i)) {
gen.writeFieldName(field.name)
fieldWriters(i).apply(row, i)
}
i += 1
}
}
Run Code Online (Sandbox Code Playgroud)
如果列为null,这似乎正在跳过列.我不太清楚为什么这是默认行为,但有没有办法在使用Spark的json中打印空值toJSON?
我正在使用Spark 2.1.0
要使用Spark的toJSON方法在JSON中打印空值,可以使用以下代码:
myData.na.fill("null").toJSON
Run Code Online (Sandbox Code Playgroud)
它会给你预期的结果:
+-------------------------------------------+
|value |
+-------------------------------------------+
|{"name":"Alice","age":"23","pets":"dog"} |
|{"name":"Bob","age":"30","pets":"dog"} |
|{"name":"Charlie","age":"35","pets":"null"}|
+-------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
我希望它有所帮助!
| 归档时间: |
|
| 查看次数: |
2770 次 |
| 最近记录: |