eif*_*cht 4 sql scala list dataframe apache-spark
我正在尝试dataframes根据一个数据创建"n" .我正在检查columnin 的Integer值dataframe并循环sql语句以创建与列中dataframes一样多的"n" Integers.
这是我的代码:
val maxvalue = spark.sql("SELECT MAX(column4) as maxval FROM mydata").collect()(0).getInt(0)
for( i <- 0 to maxvalue){
var query = "SELECT column1,colum2,colum3 FROM mydata WHERE column4 = "+ i
val newdataframe = spark.sql(query)
//add dataframe to List
}
Run Code Online (Sandbox Code Playgroud)
我需要创建"n",dataframes但我不知道如何List在循环之前声明类型并填充for.
现有dataframe数据类型:
// +------------+------------+------------+------------+
// | column1| column2| column3| column4|
// +------------+------------+------------+------------+
// | String| Double| Int| Int|
// +------------+------------+------------+------------+
Run Code Online (Sandbox Code Playgroud)
新dataframes数据类型:
// +------------+------------+------------+
// | column1| column2| column3|
// +------------+------------+------------+
// | String| Double| Int|
// +------------+------------+------------+
Run Code Online (Sandbox Code Playgroud)
Tza*_*har 12
您可以创建一个可变列表并填充它:
val dfs = mutable.ArrayBuffer[DataFrame]()
for( i <- 0 to maxvalue){
val query = "SELECT column1,colum2,colum3 FROM mydata WHERE column4 = "+ i
val newdataframe = spark.sql(query)
dfs += newdataframe
}
Run Code Online (Sandbox Code Playgroud)
但更好的方法(不使用可变数据结构)是将整数列表映射到DataFrames列表:
val dfs: Seq[DataFrame] = (0 to maxvalue).map { i =>
spark.sql("SELECT column1,colum2,colum3 FROM mydata WHERE column4 = " + i)
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1851 次 |
| 最近记录: |