how to update/transform/replace spark df column values using a hashmap

Use*_*d82 1 scala transform hashmap dataframe apache-spark

I want to replace the values of a given df column, using a hashmap but I am struggling with the syntax. Can someone please point me in the right direction or to an existing example? I have searched but not able to find something which sheds light on the exact subject.

Edit:

Imagine a dataframe like shown below:

+-----------+--------+-----------+
|       Noun| Pronoun|  Adjective|
+-----------+--------+-----------+
|      Homer| Simpson|BeerDrinker|
|      Marge| Simpson|  Housewife|
|       Bart| Simpson|        Son|
|       Lisa| Simpson|   Daughter|
|TheSimpsons|Simpsons|     Family|
+-----------+--------+-----------+
Run Code Online (Sandbox Code Playgroud)

And I have a map of key-value pairs like shown below:

  type ValueMap = scala.collection.mutable.HashMap [String,String]
  var mymap = new ValueMap ()
  mymap += ("Simpson" -> "Surname")
Run Code Online (Sandbox Code Playgroud)

I want to do an operation (which I am unable to figure out as of yet) and achieve a result like shown below. So basically in the column Pronoun, all the column values which equal Simpson have been replaced by its corresponding value from the map mymap which is Surname

+-----------+--------+-----------+
|       Noun| Pronoun|  Adjective|
+-----------+--------+-----------+
|      Homer| Surname|BeerDrinker|
|      Marge| Surname|  Housewife|
|       Bart| Surname|        Son|
|       Lisa| Surname|   Daughter|
|TheSimpsons|Simpsons|     Family|
+-----------+--------+-----------+
Run Code Online (Sandbox Code Playgroud)

Sat*_*n S 6

使用 UDF 尝试这种方法,

val myMap = Map("Simpson" -> "Surname")
val df = Seq(("Homer","Simpson","BeerDrinker"),("Marge","Simpson","Housewife"),("Bart","Simpson","Son"),("Lisa","Simpson","Daughter"),("TheSimpsons","Simpsons","Family")).toDF("Noun","Pronoun","Adjective")

df.show(false)

-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Simpson |BeerDrinker|
|Marge      |Simpson |Housewife  |
|Bart       |Simpson |Son        |
|Lisa       |Simpson |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

val getVal = udf((x: String) => myMap.getOrElse(x, x))
val resDF = df.withColumn("Pronoun", getVal($"Pronoun"))

resDF.show(false)

+-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Surname |BeerDrinker|
|Marge      |Surname |Housewife  |
|Bart       |Surname |Son        |
|Lisa       |Surname |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+
Run Code Online (Sandbox Code Playgroud)

让我知道这是否有帮助。

更新:

没有UDF,

将地图作为另一列添加到 DF

val df1 = df.withColumn("map", typedLit(myMap))
val df2 = df1.withColumn("Pronoun", when($"map"($"Pronoun").isNotNull, $"map"($"Pronoun")).otherwise($"Pronoun") ).drop("map")
df2.show(false)

+-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Surname |BeerDrinker|
|Marge      |Surname |Housewife  |
|Bart       |Surname |Son        |
|Lisa       |Surname |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+
Run Code Online (Sandbox Code Playgroud)

另一种简单的方法而不是添加新列,

val colMap = typedLit(myMap)
val df3 = df.withColumn("Pronoun", when(colMap($"Pronoun").isNotNull, colMap($"Pronoun")).otherwise($"Pronoun") )
df3.show(false)
Run Code Online (Sandbox Code Playgroud)

  • 嗨@Sathiyan,没有 UDF 的方法非常完美。我相信这可能就是我一直在寻找的东西。但不幸的是我无法让它发挥作用。我知道为什么(可能) - 因为我不知道“typeLit”以及如何在这种情况下利用它。在此示例之前,我不知道可以将映射转换为列并以所示方式使用它。我感谢您花费时间和精力来帮助我解决问题。我真的很感激。干杯! (2认同)