Tor*_*ren 2 scala dataframe apache-spark apache-spark-sql
给定一个DataFrame:
val df = sc.parallelize(List(("Mike","1986","1976"), ("Andre","1980","1966"), ("Pedro","1989","2000")))
.toDF("info", "year1", "year2")
df.show
+-----+-----+-----+
| info|year1|year2|
+-----+-----+-----+
| Mike| 1986| 1976|
|Andre| 1980| 1966|
|Pedro| 1989| 2000|
+-----+-----+-----+
Run Code Online (Sandbox Code Playgroud)
我尝试过滤所有df值结束6,但获得异常.我试过了 :
val filtered = df.filter(df.col("*").endsWith("6"))
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved object, tree: ResolvedStar(info#20, year1#21, year2#22)
Run Code Online (Sandbox Code Playgroud)
我也试过这个:
val filtered = df.select(df.col("*")).filter(_ endsWith("6"))
error: missing parameter type for expanded function ((x$1) => x$1.endsWith("6"))
Run Code Online (Sandbox Code Playgroud)
如何解决?谢谢
我不是很确定你要做什么,而是根据我的理解:
val df = sc.parallelize(List(("Mike","1986","1976"), ("Andre","1980","1966"), ("Pedro","1989","2000"))).toDF("info", "year1", "year2")
df.show
# +-----+-----+-----+
# | info|year1|year2|
# +-----+-----+-----+
# | Mike| 1986| 1976|
# |Andre| 1980| 1966|
# |Pedro| 1989| 2000|
# +-----+-----+-----+
val conditions = df.columns.map(df(_).endsWith("6")).reduce(_ or _)
df.withColumn("condition", conditions).filter($"condition" === true).drop("condition").show
# +-----+-----+-----+
# | info|year1|year2|
# +-----+-----+-----+
# |Andre| 1980| 1966|
# | Mike| 1986| 1976|
# +-----+-----+-----+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2981 次 |
| 最近记录: |