Pandas:InvalidIndexError:重新索引仅对具有唯一值的索引对象有效

dar*_*rse 6 python dataframe pandas

我有两个数据框,用于存储有关在商店购买的产品的数据。df1存储有关商店名称、产品 ID、产品名称和购买日期的数据。df2存储有关产品 ID、产品名称和类型的数据。我正在尝试df2使用收到日期值进行更新df1,但仅限于类型为 的产品P

下面给出的是数据框的视图以及我尝试做的事情。

df1

StoreName,ProdId,ProdName,DateReceived
Store A,P1,Prod1,2018-05-01
Store A,P2,Prod2,2018-05-02
Store B,P1,Prod1,2018-05-04
Run Code Online (Sandbox Code Playgroud)

df2

DateRecived,ProdId,ProdName,Type

,P1,Prod1,P
,P2,Prod2,P
,P3,Prod3,S
Run Code Online (Sandbox Code Playgroud)

脚本:

df2['DateRecived'] = df2['ProdId'].map(df1.set_index('ProdId')['StoreName']).df2['Type'] == 'P'
Run Code Online (Sandbox Code Playgroud)

运行此命令会引发以下错误:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Run Code Online (Sandbox Code Playgroud)

任何人都可以帮助我修改脚本,以便我能够通过Store Name和过滤出值Prod Namedf2填充该DateReceived值。谢谢。

jez*_*ael 8

问题是重复 -P1产品是两倍:

s = df1.set_index('ProdId')['StoreName']
print (s)

ProdId
P1    Store A
P2    Store A
P1    Store B
Name: StoreName, dtype: object
Run Code Online (Sandbox Code Playgroud)

因此需要唯一的值,drop_duplicates只保留第一个值:

s = df1.drop_duplicates('ProdId').set_index('ProdId')['StoreName']
print (s)
ProdId
P1    Store A
P2    Store A
Name: StoreName, dtype: object
Run Code Online (Sandbox Code Playgroud)

然后可以通过布尔掩码替换:

mask = df2['Type'] == 'P'
df2['DateRecived'] = df2['DateRecived'].mask(mask, df2['ProdId'].map(s))
print (df2)
  DateRecived ProdId ProdName Type
0     Store A     P1    Prod1    P
1     Store A     P2    Prod2    P
2         NaN     P3    Prod3    S
Run Code Online (Sandbox Code Playgroud)
df2.loc[mask, 'DateRecived'] = df2.loc[mask, 'ProdId'].map(s)
print (df2)
  DateRecived ProdId ProdName Type
0     Store A     P1    Prod1    P
1     Store A     P2    Prod2    P
2         NaN     P3    Prod3    S
Run Code Online (Sandbox Code Playgroud)