如何计算列表列中列值的出现次数？

Question

如何计算列表列中列值的出现次数？

考虑以下数据框：

    column_of_lists   scalar_col
0   [100, 200, 300]       100
1   [100, 200, 200]       200
2   [300, 500]            300
3   [100, 100]            200

Run Code Online (Sandbox Code Playgroud)

scalar_col所需的输出将是一个 Series，表示的标量值在列表列中出现的次数。

所以，在我们的例子中：

1 # 100 appears once in its respective list
2 # 200 appears twice in its respective list
1 # ...
0

Run Code Online (Sandbox Code Playgroud)

我尝试过以下方法：

df['column_of_lists'].apply(lambda x: x.count(df['scalar_col'])

Run Code Online (Sandbox Code Playgroud)

我知道它不起作用，因为我要求它计算一个系列而不是单个值。

欢迎任何帮助！

Answer 1

jez*_*ael 6

使用列表理解：

\n

df[\'new\'] = [x.count(y) for x,y in zip(df[\'column_of_lists\'], df[\'scalar_col\'])]\nprint (df)\n   column_of_lists  scalar_col  new\n0  [100, 200, 300]         100    1\n1  [100, 200, 200]         200    2\n2       [300, 500]         300    1\n3       [100, 100]         200    0\n

Run Code Online (Sandbox Code Playgroud)\n

如果性能不重要，请DataFrame.apply使用axis=1：

\n

df["new"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)\n

Run Code Online (Sandbox Code Playgroud)\n

\n

#40k rows\ndf = pd.concat([df] * 10000, ignore_index=True)\n\nIn [145]: %timeit df["new1"] = df.apply(lambda x: x["column_of_lists"].count(x["scalar_col"]), axis=1)\n572 ms \xc2\xb1 99.9 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\nIn [146]: %timeit df[\'new2\'] = [x.count(y) for x,y in zip(df[\'column_of_lists\'], df[\'scalar_col\'])]\n22.7 ms \xc2\xb1 840 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n\nIn [147]: %%timeit\n     ...: x = df.explode(\'column_of_lists\')\n     ...: df[\'counts\'] = x.column_of_lists.eq(x.scalar_col).groupby(x.index).sum()\n     ...: \n61.2 ms \xc2\xb1 306 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	2 年，10 月前
查看次数：	97 次
最近记录：	2 年，10 月前