我有数据框:
data = [['t1', ['u1','u2', 'u3', 'u4', 'u5'], 1],['t2', ['u1','u7', 'u8', 'u5'], 1], ['t3', ['u1','u2', 'u7', 'u11'], 2], ['t4', ['u8','u9'], 3], ['t5', ['u9','u22', 'u11'], 3],
['t6', ['u5','u11', 'u22', 'u4'], 3]]
sdf = spark.createDataFrame(data, schema=['label', 'id', 'day'])
sdf.show()
+-----+--------------------+---+
|label| id|day|
+-----+--------------------+---+
| t1|[u1, u2, u3, u4, u5]| 1|
| t2| [u1, u7, u8, u5]| 1|
| t3| [u1, u2, u7, u11]| 2|
| t4| [u8, u9]| 3|
| t5| [u9, u22, u11]| 3|
| t6| [u5, u11, u22, u4]| 3| …Run Code Online (Sandbox Code Playgroud)