Dev*_*Lee 13 python indexing counter dataframe pandas
我在过滤pandas数据帧时遇到问题.
city
NYC
NYC
NYC
NYC
SYD
SYD
SEL
SEL
...
df.city.value_counts()
Run Code Online (Sandbox Code Playgroud)
我想删除频率小于4的城市行,例如SYD和SEL.
如果不手动逐个城市地删除它们,会有什么办法呢?
WeN*_*Ben 15
在这里你去过滤器
df.groupby('city').filter(lambda x : len(x)>3)
Out[1743]:
city
0 NYC
1 NYC
2 NYC
3 NYC
Run Code Online (Sandbox Code Playgroud)
这是使用的一种方式pd.Series.value_counts.
counts = df['city'].value_counts()
res = df[~df['city'].isin(counts[counts < 5].index)]
Run Code Online (Sandbox Code Playgroud)
我想你正在寻找 value_counts()
# Import the great and powerful pandas
import pandas as pd
# Create some example data
df = pd.DataFrame({
'city': ['NYC', 'NYC', 'SYD', 'NYC', 'SEL', 'NYC', 'NYC']
})
# Get the count of each value
value_counts = df['city'].value_counts()
# Select the values where the count is less than 3 (or 5 if you like)
to_remove = value_counts[value_counts <= 3].index
# Keep rows where the city column is not in to_remove
df = df[~df.city.isin(to_remove)]
Run Code Online (Sandbox Code Playgroud)