Pandas：InvalidIndexError：重新索引仅对具有唯一值的索引对象有效

Question

Pandas：InvalidIndexError：重新索引仅对具有唯一值的索引对象有效

我有两个数据框，用于存储有关在商店购买的产品的数据。df1存储有关商店名称、产品 ID、产品名称和购买日期的数据。df2存储有关产品 ID、产品名称和类型的数据。我正在尝试df2使用收到日期值进行更新df1，但仅限于类型为的产品P。

下面给出的是数据框的视图以及我尝试做的事情。

df1：

StoreName,ProdId,ProdName,DateReceived
Store A,P1,Prod1,2018-05-01
Store A,P2,Prod2,2018-05-02
Store B,P1,Prod1,2018-05-04

Run Code Online (Sandbox Code Playgroud)

df2：

DateRecived,ProdId,ProdName,Type

,P1,Prod1,P
,P2,Prod2,P
,P3,Prod3,S

Run Code Online (Sandbox Code Playgroud)

脚本：

df2['DateRecived'] = df2['ProdId'].map(df1.set_index('ProdId')['StoreName']).df2['Type'] == 'P'

Run Code Online (Sandbox Code Playgroud)

运行此命令会引发以下错误：

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Run Code Online (Sandbox Code Playgroud)

任何人都可以帮助我修改脚本，以便我能够通过Store Name和过滤出值Prod Name并df2填充该DateReceived值。谢谢。

Answer 1

jez*_*ael 8

问题是重复 -P1产品是两倍：

s = df1.set_index('ProdId')['StoreName']
print (s)

ProdId
P1    Store A
P2    Store A
P1    Store B
Name: StoreName, dtype: object

Run Code Online (Sandbox Code Playgroud)

因此需要唯一的值，drop_duplicates只保留第一个值：

s = df1.drop_duplicates('ProdId').set_index('ProdId')['StoreName']
print (s)
ProdId
P1    Store A
P2    Store A
Name: StoreName, dtype: object

Run Code Online (Sandbox Code Playgroud)

然后可以通过布尔掩码替换：

mask = df2['Type'] == 'P'
df2['DateRecived'] = df2['DateRecived'].mask(mask, df2['ProdId'].map(s))
print (df2)
  DateRecived ProdId ProdName Type
0     Store A     P1    Prod1    P
1     Store A     P2    Prod2    P
2         NaN     P3    Prod3    S

Run Code Online (Sandbox Code Playgroud)

df2.loc[mask, 'DateRecived'] = df2.loc[mask, 'ProdId'].map(s)
print (df2)
  DateRecived ProdId ProdName Type
0     Store A     P1    Prod1    P
1     Store A     P2    Prod2    P
2         NaN     P3    Prod3    S

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，9 月前
查看次数：	24372 次
最近记录：	7 年，9 月前