Spark数据帧na.fill布尔列类型

Sud*_*yam 0 apache-spark

我可以使用以下方法填充Numberic和String类型列:

masterDF = masterDF.na.fill(-1)
masterDF = masterDF.na.fill("")
masterDF = masterDF.na.fill(-1.0)
Run Code Online (Sandbox Code Playgroud)

但我没有找到api来填充布尔类型列.我试过这个:masterDF = masterDF.na.fill(false)不支持.

有任何想法吗?

Ram*_*jan 6

您可以使用Map里面fill,其中的关键是列名Int,Long,Float,Double,String,Boolean.

masterDF.na.fill(masterDF.columns.map(_ -> false).toMap)
Run Code Online (Sandbox Code Playgroud)

API文件说:

/**
* (Scala-specific) Returns a new `DataFrame` that replaces null values.
*
* The key of the map is the column name, and the value of the map is the replacement value.
* The value must be of the following type: `Int`, `Long`, `Float`, `Double`, `String`, `Boolean`.
* Replacement values are cast to the column data type.
*
* For example, the following replaces null values in column "A" with string "unknown", and
* null values in column "B" with numeric value 1.0.
* {{{
*   df.na.fill(Map(
*     "A" -> "unknown",
*     "B" -> 1.0
*   ))
* }}}
*
* @since 1.3.1
*/
def fill(valueMap: Map[String, Any]): DataFrame = fillMap(valueMap.toSeq)
Run Code Online (Sandbox Code Playgroud)

您甚至可以Mapfill函数内部为不同的列设置不同的值.

我希望答案是有帮助的.