小编Ror*_*ory的帖子

获取列中数组的相关矩阵

我有数据框:

data = [['t1', ['u1','u2', 'u3', 'u4', 'u5'], 1],['t2', ['u1','u7', 'u8', 'u5'], 1], ['t3', ['u1','u2', 'u7', 'u11'], 2], ['t4', ['u8','u9'], 3], ['t5', ['u9','u22', 'u11'], 3],
       ['t6', ['u5','u11', 'u22', 'u4'], 3]]
sdf = spark.createDataFrame(data, schema=['label', 'id', 'day'])
sdf.show()
+-----+--------------------+---+
|label|                  id|day|
+-----+--------------------+---+
|   t1|[u1, u2, u3, u4, u5]|  1|
|   t2|    [u1, u7, u8, u5]|  1|
|   t3|   [u1, u2, u7, u11]|  2|
|   t4|            [u8, u9]|  3|
|   t5|      [u9, u22, u11]|  3|
|   t6|  [u5, u11, u22, u4]|  3| …
Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql pyspark

8
推荐指数
1
解决办法
260
查看次数

标签 统计

apache-spark ×1

apache-spark-sql ×1

pyspark ×1