从Pandas中不同数据框中的另一个匹配列更新数据框中的列值

Question

从Pandas中不同数据框中的另一个匹配列更新数据框中的列值

我有两个数据帧

 df
 city   mail
  a    satya
  b    def
  c    akash
  d    satya
  e    abc
  f    xyz
#Another Dataframe d as
 city   mail
 x      satya
 y      def
 z      akash
 u      ash

Run Code Online (Sandbox Code Playgroud)

所以现在我需要更新df中的城市来自'd'中比较邮件的更新值,如果找不到某些邮件ID,它应该保持不变.所以看起来应该是这样的

 df ### o/p should be like
 city   mail
  x    satya
  y    def
  z    akash
  x    satya  #repeated so same value should placed here
  e    abc     # not found so as it was
  f    xyz

Run Code Online (Sandbox Code Playgroud)

我试过了 -

s = {'mail': ['satya', 'def', 'akash', 'satya', 'abc', 'xyz'],'city': ['a', 'b', 'c', 'd', 'e', 'f']}
s1 = {'mail': ['satya', 'def', 'akash', 'ash'],'city': ['x', 'y', 'z', 'u']}
df = pd.DataFrame(s)
d = pd.DataFrame(s1)
#from google i tried
df.loc[df.mail.isin(d.mail),['city']] = d['city']

Run Code Online (Sandbox Code Playgroud)

#giving错误的结果为

 city   mail
 x  satya
 y  def
 z  akash
 u  satya  ###this value should be for city 'x'
 e    abc
 f    xyz

Run Code Online (Sandbox Code Playgroud)

我不能在='mail',how ='left'进行合并,因为在一个数据帧中我的客户较少.因此,在合并后,如何在合并后的城市中映射非匹配邮件城市的值.

请建议.

Answer 1

Ale*_*der 7

看起来你要更新的city价值df从city价值d.该update函数基于索引,因此首先需要设置.

# Add extra columns to dataframe.
df['mobile_no'] = ['212-555-1111'] * len(df)
df['age'] = [20] * len(df)

# Update city values keyed on `mail`.
new_city = df[['mail', 'city']].set_index('mail')
new_city.update(d.set_index('mail'))
df['city'] = new_city.values

>>> df
  city   mail     mobile_no  age
0    x  satya  212-555-1111   20
1    y    def  212-555-1111   20
2    z  akash  212-555-1111   20
3    x  satya  212-555-1111   20
4    e    abc  212-555-1111   20
5    f    xyz  212-555-1111   20

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，10 月前
查看次数：	8979 次
最近记录：	8 年，2 月前