Bir*_*Bud 4 python merge left-join dataframe pandas
我在生成的左连接中的行数比左数据框中的行数多。
# Importing Pandas and changing it's call to pd
import numpy as np
import pandas as pd
SalesDF = pd.read_csv(r"C:\Users\USER\Documents\Reports\SalesForAnalysis.csv")
print("This is the Sales shape")
print(SalesDF.shape)
CustInfoDF = pd.read_csv(r"C:\Users\USER\Documents\Cust.csv")
# This reassigns the df so that the rows with a NaN in the Account Number it  doesn't appear
CustInfoDF = CustInfoDF[CustInfoDF['Account Number'].notna()]
# Merges the two dataframes on SalesDF with "Cust Number" as the key
MergeDF = pd.merge(SalesDF, CustInfoDF, how="left", left_on="Cust Number", right_on="Account Number")
print("This is the Merge Shape ")
print(MergeDF.shape)
# Reduced the number of columns to the selected columns
CutDF = MergeDF[["Customer", "Invoice #", "E-mail Address", "Phone", "Clerk", "Total", "Date"]]
CutDF.drop_duplicates()
print("This is the Cut shape ")
print(CutDF.shape)
这是程序运行后的结果
This is the Sales shape
(5347, 61)
This is the Merge Shape 
(6428, 83)
This is the Cut shape 
(6428, 7)
Process finished with exit code 0
CutDF 最多只能有 5347 行。我在那里有一个 drop_duplicates 方法,但我仍然得到相同的结果。
我看到这个pandas left join - 为什么有更多结果?pandas 数据帧中的内部联接/合并给出的行数比左侧数据帧多 , 但我并没有真正在其中看到解决方案。
任何帮助,将不胜感激。
在执行之前:
MergeDF = pd.merge(SalesDF, CustInfoDF, how="left", left_on="Cust Number", right_on="Account Number")
你可以做:
CustInfoDF = CustInfoDF.drop_duplicates(subset=["Account Number"])
我怀疑您的CustInforDF每个都有多个条目Account Number。
如果这不起作用,您可以发布示例数据框吗?只要代码是可复制的,就可以随意添加/替换虚拟值。