小编Rox*_*ana的帖子

dataframe.show()是spark中的一个动作吗?

我有以下代码:

val df_in = sqlcontext.read.json(jsonFile) // the file resides in hdfs

//some operations in here to create df as df_in with two more columns "terms1" and "terms2" 

val intersectUDF = udf( (seq1:Seq[String], seq2:Seq[String] ) => {     seq1 intersect seq2 } ) //intersects two sequences
val symmDiffUDF = udf( (seq1:Seq[String], seq2:Seq[String] ) => { (seq1 diff seq2) ++ (seq2 diff seq1) } ) //compute the difference of two sequences

val df1 = (df.withColumn("termsInt", intersectUDF(df("terms1"), df1("terms2") ) )
             .withColumn("termsDiff", symmDiffUDF(df("terms1"),     df1("terms2") ) …
Run Code Online (Sandbox Code Playgroud)

apache-spark

6
推荐指数
2
解决办法
3845
查看次数

标签 统计

apache-spark ×1