Geo*_*ler 0 apache-spark apache-spark-sql spark-dataframe
我想有条件地在spark中填充nan值(以确保我考虑了我的数据的每个角落情况,而不是简单地用任何替换值填充任何东西).
样本看起来像
case class FooBar(foo:String, bar:String)
val myDf = Seq(("a","first"),("b","second"),("c",null), ("third","fooBar"), ("someMore","null"))
.toDF("foo","bar")
.as[FooBar]
+--------+------+
| foo| bar|
+--------+------+
| a| first|
| b|second|
| c| null|
| third|fooBar|
|someMore| null|
+--------+------+
Run Code Online (Sandbox Code Playgroud)
不幸
myDf
.withColumn(
"bar",
when(
(($"foo" === "c") and ($"bar" isNull)) , "someReplacement"
)
).show
Run Code Online (Sandbox Code Playgroud)
重置列中的所有常规其他值
+--------+---------------+
| foo| bar|
+--------+---------------+
| a| null|
| b| null|
| c|someReplacement|
| third| null|
|someMore| null|
+--------+---------------+
Run Code Online (Sandbox Code Playgroud)
和
myDf
.withColumn(
"bar",
when(
(($"foo" === "c") and ($"bar" isNull)) or
(($"foo" === "someMore") and ($"bar" isNull)), "someReplacement"
)
).show
Run Code Online (Sandbox Code Playgroud)
我真的想用它来填写foo的不同类/类别的值.不起作用.
我很好奇如何解决这个问题.
小智 5
用途otherwise:
when(
(($"foo" === "c") and ($"bar" isNull)) or
(($"foo" === "someMore") and ($"bar" isNull)), "someReplacement"
).otherwise($"bar")
Run Code Online (Sandbox Code Playgroud)
或者coalesce:
coalesce(
$"bar",
when(($"foo" === "c") or ($"foo" === "someMore"), "someReplacement")
)
Run Code Online (Sandbox Code Playgroud)
原因coalesce是...减少打字(所以你不要重复$"bar" isNull).
| 归档时间: |
|
| 查看次数: |
314 次 |
| 最近记录: |