如何使用 Pandas 进行左连接

Question

如何使用 Pandas 进行左连接

我有 2 个数据框，它看起来像这样：DF1：

Product, Region, ProductScore
AAA, R1,100
AAA, R2,100
BBB, R2,200
BBB, R3,200

Run Code Online (Sandbox Code Playgroud)

DF2：

Region, RegionScore
R1,1
R2,2

Run Code Online (Sandbox Code Playgroud)

我怎样才能让这 2 个加入 1 个数据帧，结果应该是这样的：

Product, Region, ProductScore, RegionScore
AAA, R1,100,1
AAA, R2,100,2
BBB, R2,200,2

Run Code Online (Sandbox Code Playgroud)

非常感谢！

编辑1：

我使用了 df.merge(df_new) 得到这个错误消息：

  File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 4071, in merge
    suffixes=suffixes, copy=copy)
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 37, in merge
    copy=copy)
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 183, in __init__
    self.join_names) = self._get_merge_keys()
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 318, in _get_merge_keys
    self._validate_specification()
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 409, in _validate_specification
    if not self.right.columns.is_unique:
AttributeError: 'list' object has no attribute 'is_unique'

Run Code Online (Sandbox Code Playgroud)

EDIT2：我意识到我的 df_new 是一个数据系列（通过使用 groupby 创建）而不是数据框。现在我已将其转换为数据框，这是信息：print(df.info()) Int64Index: 1111 条目，0 到 1110 数据列（共 8 列）：产品 1111 非空对象 reviewuserId 1111 非空object reviewprofileName 1111 非空对象 reviewelpfulness 881 非空 float64 评论分数 1111 非空 float64 reviewtime 1111 非空 int64 reviewsummary 1111 非空对象 reviewtext 1111 非空对象 dtypes: float64(2), int64(1), object (5) 内存占用：56.4+ KB 无

print(df_new_2.info())

<class 'pandas.core.frame.DataFrame'>
Index: 1089 entries, A100Y8WSLFJN7Q to AZWBQPQN96SS6
Data columns (total 1 columns):
reviewelpfulnessbyuserid    864 non-null float64
dtypes: float64(1)
memory usage: 12.8+ KB
None

Run Code Online (Sandbox Code Playgroud)

print(df.head())

      product    reviewuserId                         reviewprofileName  \
0  B003AI2VGA  A141HP4LYPWMSR          Brian E. Erland "Rainbow Sphinx"   
1  B003AI2VGA  A328S9RN3U5M68                                Grady Harp   
2  B003AI2VGA  A1I7QGUDP043DG                 Chrissy K. McVay "Writer"   
3  B003AI2VGA  A1M5405JH9THP9                              golgotha.gov   
4  B003AI2VGA   ATXL536YX71TR  KerrLines "&#34;MoviesMusicTheatre&#34;"   

   reviewelpfulness  reviewscore  reviewtime  \
0               1.0            3  1182729600   
1               1.0            3  1181952000   
2               0.8            5  1164844800   
3               1.0            3  1197158400   
4               1.0            3  1188345600   

                                       reviewsummary  \
0  There Is So Much Darkness Now ~ Come For The M...   
1  Worthwhile and Important Story Hampered by Poo...   
2                      This movie needed to be made.   
3                  distantly based on a real tragedy   
4  What's going on down in Juarez and shining a l...   

                                          reviewtext  
0  Synopsis: On the daily trek from Juarez Mexico...  
1  THE VIRGIN OF JUAREZ is based on true events s...  
2  The scenes in this film can be very disquietin...  
3  THE VIRGIN OF JUAREZ (2006)<br />directed by K...  
4  Informationally this SHOWTIME original is esse...

Run Code Online (Sandbox Code Playgroud)

print(df_new_2.head())

                reviewelpfulnessbyuserid
reviewuserId                            
A100Y8WSLFJN7Q                       NaN
A103VZ3KDF2RT5                  0.555556
A1041HQGJDKFG5                  0.000000
A10FBJXMQPI0LL                  0.333333
A10LIHFA4SSK3F                  0.000000

Run Code Online (Sandbox Code Playgroud)

现在错误消息如下所示：

  File "pandas\hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12245)
KeyError: 'reviewuserId'

Run Code Online (Sandbox Code Playgroud)

打印这些信息后，我通过简单地添加解决了这个问题： df_new_2 = df_new.to_frame().reset_index()

Answer 1

EdC*_*ica 8

当您用跳过该行时R3，您要求的不是左合并，您只想执行内部merge：

In [120]:
df.merge(df1)

Out[120]:
  Product Region  ProductScore  RegionScore
0     AAA     R1           100            1
1     AAA     R2           100            2
2     BBB     R2           200            2

Run Code Online (Sandbox Code Playgroud)

左合并将导致：

In [121]:
df.merge(df1, how='left')

Out[121]:
  Product Region  ProductScore  RegionScore
0     AAA     R1           100            1
1     AAA     R2           100            2
2     BBB     R2           200            2
3     BBB     R3           200          NaN

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，3 月前
查看次数：	8853 次
最近记录：	7 年，5 月前