小编pon*_*thu的帖子

稀疏的矢量pyspark

我想找到一种使用数据帧在PySpark中创建备用向量的有效方法.

让我们说给出交易输入:

df = spark.createDataFrame([
    (0, "a"),
    (1, "a"),
    (1, "b"),
    (1, "c"),
    (2, "a"),
    (2, "b"),
    (2, "b"),
    (2, "b"),
    (2, "c"),
    (0, "a"),
    (1, "b"),
    (1, "b"),
    (2, "cc"),
    (3, "a"),
    (4, "a"),
    (5, "c")
], ["id", "category"])
Run Code Online (Sandbox Code Playgroud)
+---+--------+
| id|category|
+---+--------+
|  0|       a|
|  1|       a|
|  1|       b|
|  1|       c|
|  2|       a|
|  2|       b|
|  2|       b|
|  2|       b|
|  2|       c|
|  0|       a|
|  1|       b|
|  1|       b| …
Run Code Online (Sandbox Code Playgroud)

python sparse-matrix apache-spark pyspark

5
推荐指数
1
解决办法
7029
查看次数

标签 统计

apache-spark ×1

pyspark ×1

python ×1

sparse-matrix ×1