使用 pandas 从列值计数中获取顶部行

Question

使用 pandas 从列值计数中获取顶部行

假设我有这样的数据。这是对某些产品的一组评论。

prod_id text    rating
AB123   some text   5
AB123   some text   2
AB123   some text   4
AC456   some text   3
AC456   some text   2
AD777   some text   2
AD777   some text   5
AD777   some text   5
AD777   some text   4
AE999   some text   4
AF000   some text   5
AG222   some text   5
AG222   some text   3
AG222   some text   3

Run Code Online (Sandbox Code Playgroud)

我想知道哪个产品的评论最多（行数最多），因此我使用以下代码来获取前 3 个产品（我只需要 3 个评论最多的产品）。

s = df['prod_id'].value_counts().sort_values(ascending=False).head(3)

Run Code Online (Sandbox Code Playgroud)

然后我会得到这个结果。

AD777   4
AB123   3
AG222   3

Run Code Online (Sandbox Code Playgroud)

但我真正需要的是具有上述 id 的行。我需要所有 AD777、AB123 和 AG222 的整行，如下所示。

product_id  text    rating
AD777   some text   2
AD777   some text   5
AD777   some text   5
AD777   some text   4
AB123   some text   5
AB123   some text   2
AB123   some text   4
AG222   some text   5
AG222   some text   3
AG222   some text   3

Run Code Online (Sandbox Code Playgroud)

我怎么做？我尝试过print(df.iloc[s])，但当然它不起作用。正如我在文档中阅读的那样，value_counts返回系列而不是数据帧。任何想法？谢谢

Answer 1

jez*_*ael 5

我认为你需要使用mergejoin leftwithDataFrame创建index：s

df = pd.DataFrame({'prod_id':s.index}).merge(df, how='left')
print (df)
  prod_id       text  rating
0   AD777  some text       2
1   AD777  some text       5
2   AD777  some text       5
3   AD777  some text       4
4   AB123  some text       5
5   AB123  some text       2
6   AB123  some text       4
7   AG222  some text       5
8   AG222  some text       3
9   AG222  some text       3

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，3 月前
查看次数：	5908 次
最近记录：	5 年，10 月前