使用map时Pandas警告:尝试在DataFrame的切片副本上设置值

Question

使用map时Pandas警告:尝试在DataFrame的切片副本上设置值

我有以下代码,它的工作原理.这基本上重命名列中的值,以便以后可以合并它们.

pop = pd.read_csv('population.csv')
pop_recent = pop[pop['Year'] == 2014]

mapping = {
        'Korea, Rep.': 'South Korea',
        'Taiwan, China': 'Taiwan'
}
f= lambda x: mapping.get(x, x)
pop_recent['Country Name'] = pop_recent['Country Name'].map(f)

Run Code Online (Sandbox Code Playgroud)

警告: 正在尝试在DataFrame的切片副本上设置值.尝试使用.loc [row_indexer,col_indexer] = value,请参阅文档中的警告:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy pop_recent ['国家名称'] = pop_recent ['国家名称'].地图(f)

我确实谷歌了!但似乎没有任何例子使用地图,所以我不知所措......

Answer 1

Ana*_*mar 10

问题在于链式索引,您实际上要做的是将值设置为 - pop[pop['Year'] == 2014]['Country Name']- 这在大多数情况下都不起作用(如链接文档中所解释的那样),因为这是两个不同的调用和一个调用可能会返回数据帧的副本(我相信布尔索引)正在返回数据帧的副本).

因此,当您尝试将值设置为该副本时,它不会反映在原始数据框中.示例 -

In [6]: df
Out[6]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [7]: df[df['A']==1]['B'] = 10
/path/to/ipython-script.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

In [8]: df
Out[8]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

Run Code Online (Sandbox Code Playgroud)

如上所述,您应该使用DataFrame.loc索引行以及要在单个调用中更新的列来代替链式索引,从而避免此错误.示例 -

pop.loc[(pop['year'] == 2014), 'Country Name'] = pop.loc[(pop['year'] == 2014), 'Country Name'].map(f)

Run Code Online (Sandbox Code Playgroud)

或者,如果这看起来太长,您可以事先创建一个掩码(布尔数据帧)并分配给变量,并在上面的语句中使用它.示例 -

mask = pop['year'] == 2014
pop.loc[mask,'Country Name'] = pop.loc[mask,'Country Name'].map(f)

Run Code Online (Sandbox Code Playgroud)

演示 -

In [9]: df
Out[9]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [10]: mapping = { 1:2 , 3:4}

In [11]: f= lambda x: mapping.get(x, x)

In [12]: df.loc[(df['B']==2),'A'] = df.loc[(df['B']==2),'A'].map(f)

In [13]: df
Out[13]:
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9

Run Code Online (Sandbox Code Playgroud)

使用蒙版方法进行演示-

In [18]: df
Out[18]:
   A  B
0  1  2
1  3  4
2  4  5
3  6  7
4  8  9

In [19]: mask = df['B']==2

In [20]: df.loc[mask,'A'] = df.loc[mask,'A'].map(f)

In [21]: df
Out[21]:
   A  B
0  2  2
1  3  4
2  4  5
3  6  7
4  8  9

Run Code Online (Sandbox Code Playgroud)

我讨厌熊猫.我的印象是,它是一个没有一致性的黑客攻击.作为一个更有经验的用户,您怎么看？ (3认同)

归档时间：	10 年前
查看次数：	26312 次
最近记录：	10 年前