如何比较数据框中同一列的数据（Pandas）

Question

如何比较数据框中同一列的数据（Pandas）

Abr*_*ola 5 python numpy dataframe pandas

我有一个熊猫的数据框，如下所示：

而且我想获得 2007 年 PIB 低于 2002 年的国家/地区，但我无法仅使用 Pandas 内置方法而不使用 python 迭代或类似方法编写代码来执行此操作。我得到的最多的是以下几行：

df[df[df.year == 2007].PIB < df[df.year == 2002].PIB].country

Run Code Online (Sandbox Code Playgroud)

但我收到以下错误：

ValueError: Can only compare identically-labeled Series objects

Run Code Online (Sandbox Code Playgroud)

直到现在我只使用 Pandas 来过滤来自不同列的数据，但我不知道如何比较来自同一列的数据，在这种情况下是年份。欢迎任何支持。

Answer 1

jez*_*ael 2

我建议Series按country列创建索引，但必须在具有相同索引值的系列中2007和2002比较系列中使用相同数量的国家：

df = pd.DataFrame({'country': ['Afganistan', 'Zimbabwe', 'Afganistan', 'Zimbabwe'],
                  'PIB': [200, 200, 100, 300], 
                  'year': [2002, 2002, 2007, 2007]})
print (df)
      country  PIB  year
0  Afganistan  200  2002
1    Zimbabwe  200  2002
2  Afganistan  100  2007
3    Zimbabwe  300  2007

Run Code Online (Sandbox Code Playgroud)

df = df.set_index('country')
print (df)
            PIB  year
country              
Afganistan  200  2002
Zimbabwe    200  2002
Afganistan  100  2007
Zimbabwe    300  2007

s1 = df.loc[df.year == 2007, 'PIB'] 
s2 = df.loc[df.year == 2002, 'PIB']
print (s1)
country
Afganistan    100
Zimbabwe      300
Name: PIB, dtype: int64

print (s2)
country
Afganistan    200
Zimbabwe      200
Name: PIB, dtype: int64

countries = s1.index[s1 < s2]
print (countries)
Index(['Afganistan'], dtype='object', name='country')

Run Code Online (Sandbox Code Playgroud)

另一个想法是首先按年份进行旋转DataFrame.pivot，然后按年份选择列并与中的索引进行比较boolean indexing：

df1 = df.pivot('country','year','PIB')
print (df1)
year        2002  2007
country               
Afganistan   200   100
Zimbabwe     200   300

countries = df1.index[df1[2007] < df1[2002]]
print (countries)
Index(['Afganistan'], dtype='object', name='country')

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年前
查看次数：	539 次
最近记录：	5 年前