Python:在计数条件下删除行

Question

Python:在计数条件下删除行

Dev*_*Lee 13 python indexing counter dataframe pandas

我在过滤pandas数据帧时遇到问题.

city 
NYC 
NYC 
NYC 
NYC 
SYD 
SYD 
SEL 
SEL
...

df.city.value_counts()

Run Code Online (Sandbox Code Playgroud)

我想删除频率小于4的城市行,例如SYD和SEL.

如果不手动逐个城市地删除它们,会有什么办法呢？

Answer 1

WeN*_*Ben 15

在这里你去过滤器

df.groupby('city').filter(lambda x : len(x)>3)
Out[1743]: 
  city
0  NYC
1  NYC
2  NYC
3  NYC

Run Code Online (Sandbox Code Playgroud)

这是一个梦幻般的单班轮！我真的应该更多地使用`groupby`，目前它对我来说仍然是一种黑魔法。 (2认同)
好一个。不幸的是，`lambda` 往往让我生病:(。只有小剂量才有好处！ (2认同)

Answer 2

jpp*_*jpp 8

这是使用的一种方式pd.Series.value_counts.

counts = df['city'].value_counts()

res = df[~df['city'].isin(counts[counts < 5].index)]

Run Code Online (Sandbox Code Playgroud)

Answer 3

Aar*_*ock 8

我想你正在寻找 value_counts()

# Import the great and powerful pandas
import pandas as pd

# Create some example data
df = pd.DataFrame({
    'city': ['NYC', 'NYC', 'SYD', 'NYC', 'SEL', 'NYC', 'NYC']
})

# Get the count of each value
value_counts = df['city'].value_counts()

# Select the values where the count is less than 3 (or 5 if you like)
to_remove = value_counts[value_counts <= 3].index

# Keep rows where the city column is not in to_remove
df = df[~df.city.isin(to_remove)]

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，8 月前
查看次数：	5283 次
最近记录：	6 年前