spark df.write 引用所有字段但不是空值

Question

spark df.write 引用所有字段但不是空值

dre*_*ddy 5 csv apache-spark spark-dataframe

我正在尝试从存储在表中的值创建一个 csv：

 | col1   | col2   | col3  |
 | "one"  | null   | "one" |
 | "two"  | "two"  | "two" |

hive > select * from table where col2 is null;
 one   null    one

Run Code Online (Sandbox Code Playgroud)

我使用以下代码获取 csv：

df.repartition(1)
  .write.option("header",true)
  .option("delimiter", ",")
  .option("quoteAll", true)
  .option("nullValue", "")
  .csv(S3Destination)

Run Code Online (Sandbox Code Playgroud)

CSV我得到：

"col1","col2","col3"
"one","","one"
"two","two","two"

Run Code Online (Sandbox Code Playgroud)

预期的 Csv：对于 NULL 值没有双引号

"col1","col2","col3"
"one",,"one"
"two","two","two"

Run Code Online (Sandbox Code Playgroud)

感谢您了解数据帧编写者是否可以选择执行此操作。

Answer 1

Ram*_*ram 1

您可以采用 udf 方法并应用于列（使用withColumn上面重新分区的 datafrmae），其中可能存在双引号空字符串，请参阅下面的示例代码

 sqlContext.udf().register("convertToEmptyWithOutQuotes",(String abc) -> (abc.trim().length() > 0 ? abc : abc.replace("\"", " ")),DataTypes.StringType);

Run Code Online (Sandbox Code Playgroud)

String有replace完成这项工作的方法。

val a =  Array("'x'","","z")
println(a.mkString(",").replace("\"", " "))

Run Code Online (Sandbox Code Playgroud)

将产生'x',,z

归档时间：	8 年，2 月前
查看次数：	1561 次
最近记录：	8 年，1 月前