最初我有一个矩阵
0.0 0.4 0.4 0.0
0.1 0.0 0.0 0.7
0.0 0.2 0.0 0.3
0.3 0.0 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
该矩阵matrix被转换成normal_array由
`val normal_array = matrix.toArray`
Run Code Online (Sandbox Code Playgroud)
我有一个字符串数组
inputCols : Array[String] = Array(p1, p2, p3, p4)
Run Code Online (Sandbox Code Playgroud)
我需要将此矩阵转换为以下数据帧。(注意:矩阵中的行数和列数将与的长度相同inputCols)
index p1 p2 p3 p4
p1 0.0 0.4 0.4 0.0
p2 0.1 0.0 0.0 0.7
p3 0.0 0.2 0.0 0.3
p4 0.3 0.0 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
在python中,这可以通过pandas库轻松实现。
arrayToDataframe = pandas.DataFrame(normal_array,columns = inputCols, index = inputCols)
Run Code Online (Sandbox Code Playgroud)
但是我该怎么做Scala呢?
我有我的输入spark-dataframe命名df为,
+---------------+----------------+-----------------------+
|Main_CustomerID|126+ Concentrate|2.5 Ethylhexyl_Acrylate|
+---------------+----------------+-----------------------+
| 725153| 3.0| 2.0|
| 873008| 4.0| 1.0|
| 625109| 1.0| 0.0|
+---------------+----------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)
我需要从以df下列名称中删除特殊字符,
去掉 +
替换为 underscore
dot为underscore所以我df应该像
+---------------+---------------+-----------------------+
|Main_CustomerID|126_Concentrate|2_5_Ethylhexyl_Acrylate|
+---------------+---------------+-----------------------+
| 725153| 3.0| 2.0|
| 873008| 4.0| 1.0|
| 625109| 1.0| 0.0|
+---------------+---------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)
使用Scala,我已经做到了,
var tableWithColumnsRenamed = df
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll("\\.", "_"))
}
for (field <- tableWithColumnsRenamed.columns) {
tableWithColumnsRenamed = tableWithColumnsRenamed
.withColumnRenamed(field, field.replaceAll("\\+", …Run Code Online (Sandbox Code Playgroud)