PRI*_*A M 1 replace scala apache-spark-sql
我有我的输入spark-dataframe命名df为,
+---------------+----------------+-----------------------+
|Main_CustomerID|126+ Concentrate|2.5 Ethylhexyl_Acrylate|
+---------------+----------------+-----------------------+
| 725153| 3.0| 2.0|
| 873008| 4.0| 1.0|
| 625109| 1.0| 0.0|
+---------------+----------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)
我需要从以df下列名称中删除特殊字符,
去掉 +
替换为 underscore
dot为underscore所以我df应该像
+---------------+---------------+-----------------------+
|Main_CustomerID|126_Concentrate|2_5_Ethylhexyl_Acrylate|
+---------------+---------------+-----------------------+
| 725153| 3.0| 2.0|
| 873008| 4.0| 1.0|
| 625109| 1.0| 0.0|
+---------------+---------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)
使用Scala,我已经做到了,
var tableWithColumnsRenamed = df
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll("\\.", "_"))
}
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll("\\+", ""))
}
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll(" ", "_"))
}
df = tableWithColumnsRenamed
Run Code Online (Sandbox Code Playgroud)
我用的时候
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll("\\.", "_"))
.withColumnRenamed(field, field.replaceAll("\\+", ""))
.withColumnRenamed(field, field.replaceAll(" ", "_"))
}
Run Code Online (Sandbox Code Playgroud)
我得到了第一列名称,126 Concentrate而不是126_Concentrate
但是我不喜欢3 for这个替换循环。我可以得到解决方案吗?
您可以使用withColumnRenamed regex replaceAllIn和foldLeft如下
val columns = df.columns
val regex = """[+._, ]+"""
val replacingColumns = columns.map(regex.r.replaceAllIn(_, "_"))
val resultDF = replacingColumns.zip(columns).foldLeft(df){(tempdf, name) => tempdf.withColumnRenamed(name._2, name._1)}
resultDF.show(false)
Run Code Online (Sandbox Code Playgroud)
这应该给你
+---------------+---------------+-----------------------+
|Main_CustomerID|126_Concentrate|2_5_Ethylhexyl_Acrylate|
+---------------+---------------+-----------------------+
|725153 |3.0 |2.0 |
|873008 |4.0 |1.0 |
|625109 |1.0 |0.0 |
+---------------+---------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)
我希望答案有帮助
df
.columns
.foldLeft(df){(newdf, colname) =>
newdf.withColumnRenamed(colname, colname.replace(" ", "_").replace(".", "_"))
}
.show
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6086 次 |
| 最近记录: |