在pandas DataFrame中设置新列的正确方法是为了避免SettingWithCopyWarning

djj*_*djj 18 python pandas

试图在netc df中创建一个新列,但我收到了警告

netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM

C:\Anaconda\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Run Code Online (Sandbox Code Playgroud)

在新版本的Pandas中创建一个字段的正确方法是什么,以避免收到警告?

pd.__version__
Out[45]:
u'0.19.2+0.g825876c.dirty'
Run Code Online (Sandbox Code Playgroud)

Fil*_*rda 18

正如错误中所述,尝试使用.loc[row_indexer,col_indexer]创建新列.

netc.loc[:,"DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM.
Run Code Online (Sandbox Code Playgroud)

笔记

通过Pandas Indexing Docs,您的代码应该可以运行.

netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
Run Code Online (Sandbox Code Playgroud)

被翻译成

netc.__setitem__('DeltaAMPP', netc.LOAD_AM - netc.VPP12_AM)
Run Code Online (Sandbox Code Playgroud)

哪个应该有可预测的行为.在SettingWithCopyWarning仅存在链式分配期间,警告的意外行为的用户(这是你做的不是).但是,如文档中所述,

有时SettingWithCopy,当没有明显的链式索引时,会出现警告.这些SettingWithCopy是旨在捕获的错误!熊猫可能会试图警告你,你已经这样做了:

然后,文档继续给出一个示例,说明何时可能会出现错误,即使它不是预期的.所以我不知道为什么没有更多的背景会发生这种情况.

  • 我做了 `consistent_cnr.loc[:, 'num_weights'] =consistent_cnr.loc[:, 'name'].apply(apply_get_num_weights_biases).values` 但一直收到这个“警告”。必须在导入后立即使用 `pd.options.mode.chained_assignment = None` 来抑制它。 (11认同)
  • 现在它给了我 2 个警告,而不是 1 个。代码: myframe.loc[:,'mynewcol'] = 1 (4认同)
  • 如果您的数据帧被过滤或切片,则需要在使用以下答案之前重置索引:“netc.reset_index(drop=True, inplace=True)”。否则,该解决方案将不起作用,并且您会收到其他评论中描述的两个警告。 (4认同)

Mar*_*hke 18

在将数据分配给通过索引构造的SettingWithCopyWarningDataFrame 时,我遇到了问题。df两个命令

  • df['new_column'] = something
  • df.loc[:, 'new_column'] = something

没有警告就无法工作。一旦复制dfDataFrame.copy())一切都很好。

在下面的代码中,比较df0 = df_test[df_test['a']>3]df1 = df_test[df_test['a']>3].copy()。对于df0这两个作业都会抛出警告。两者df1都工作得很好。

>>> df_test
      a     b     c     d  e
0   0.0   1.0   2.0   3.0  0
1   4.0   5.0   6.0   7.0  1
2   8.0   9.0  10.0  11.0  2
3  12.0  13.0  14.0  15.0  3
4  16.0  17.0  18.0  19.0  4
>>> df0 = df_test[df_test['a']>3]
>>> df1 = df_test[df_test['a']>3].copy()
>>> df0['e'] = np.arange(4)
__main__:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
>>> df1['e'] = np.arange(4)
>>> df0.loc[2, 'a'] = 77
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:1719: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
>>> df1.loc[2, 'a'] = 77
>>> df0
      a     b     c     d  e
1   4.0   5.0   6.0   7.0  0
2  77.0   9.0  10.0  11.0  1
3  12.0  13.0  14.0  15.0  2
4  16.0  17.0  18.0  19.0  3
>>> df1
      a     b     c     d  e
1   4.0   5.0   6.0   7.0  0
2  77.0   9.0  10.0  11.0  1
3  12.0  13.0  14.0  15.0  2
4  16.0  17.0  18.0  19.0  3
Run Code Online (Sandbox Code Playgroud)

顺便说一句:建议阅读有关此问题的文档(警告中的链接)


Ron*_*xão 5

Your example is incomplete, as it doesn't show where netc comes from. It is likely that netc itself is the product of slicing, and as such Pandas cannot make guarantees that it isn't a view or a copy.

For example, if you're doing this:

netc = netb[netb["DeltaAMPP"] == 0]
netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
Run Code Online (Sandbox Code Playgroud)

then Pandas wouldn't know if netc is a view or a copy. If it were a one-liner, it would effectively be like this:

netb[netb["DeltaAMPP"] == 0]["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
Run Code Online (Sandbox Code Playgroud)

where you can see the double indexing more clearly.

If you want to make netc separate from netb, one possible remedy might be to force a copy in the first line (the loc is to make sure we're not copying two times), like:

netc = netb.loc[netb["DeltaAMPP"] == 0].copy()
Run Code Online (Sandbox Code Playgroud)

If, on the other hand, you want to have netb modified with the new column, you may do:

netb.loc[netb["DeltaAMPP"] == 0, "DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
Run Code Online (Sandbox Code Playgroud)

  • 检查上游以确定 df 的创建方式是我经常遇到的令人沮丧的微妙之处之一。这篇文章中的建议比其他任何建议更经常地帮助我解决这个问题。 (4认同)
  • 它还帮助我了解正在发生的事情。我一直想知道为什么我看到所有这些警告,而乍一看却没有什么明显的;这些隐含的副作用会产生影响。谢谢。 (2认同)

小智 5

您需要在创建列时重置索引,特别是如果您对特定值进行了过滤......那么您不需要使用 .loc[row_indexer,col_indexer]

netc.reset_index(drop=True, inplace=True)
netc["DeltaAMPP"] = netc.LOAD_AM - netc.VPP12_AM
Run Code Online (Sandbox Code Playgroud)

然后它应该工作:)