小编Kor*_*a K的帖子

Scala 将 WrappedArray 或 Array[Any] 转换为 Array[String]

我一直在尝试将 RDD 转换为数据帧。为此,需要定义类型而不是 Any。我正在使用 spark MLLib PrefixSpan,这就是 freqSequence.sequence 的来源。我从一个包含 Session_ID、视图和购买作为字符串数组的数据框开始:

viewsPurchasesGrouped: org.apache.spark.sql.DataFrame =
  [session_id: decimal(29,0), view_product_ids: array[string], purchase_product_ids: array[string]]
Run Code Online (Sandbox Code Playgroud)

然后我计算频繁模式并在数据框中需要它们,以便我可以将它们写入 Hive 表。

val viewsPurchasesRddString = viewsPurchasesGrouped.map( row => Array(Array(row(1)), Array(row(2)) ))

val prefixSpan = new PrefixSpan()
  .setMinSupport(0.001)
  .setMaxPatternLength(2)

val model = prefixSpan.run(viewsPurchasesRddString)

val freqSequencesRdd = sc.parallelize(model.freqSequences.collect())

case class FreqSequences(views: Array[String], purchases: Array[String], support: Long)

val viewsPurchasesDf = freqSequencesRdd.map( fs =>
  {   
  val views = fs.sequence(0)(0)
  val purchases = fs.sequence(1)(0)
  val freq = fs.freq
  FreqSequences(views, purchases, freq)
  }
)
viewsPurchasesDf.toDF() // …
Run Code Online (Sandbox Code Playgroud)

arrays scala

5
推荐指数
1
解决办法
1万
查看次数

标签 统计

arrays ×1

scala ×1