Pandas - 在 pd.merge 时为缺失值添加 NaN

EA0*_*A00 5 python merge pandas

我将目录中的所有文件连接在一起,但是有些文件具有不同数量的条目 - 当文件中没有该键的值时,如何放置 NaN?

例如:

文件1.cs

 NUM, NAME, ORG, DATA
 1,AAA,10,123.4
 1,AAB,20,176.5
 1,AAC,30,133.5
Run Code Online (Sandbox Code Playgroud)

文件 2. CS

 NUM, NAME, ORG, DATA
 1,AAA,10,111.4
 1,AAC,30,122.5
 2,BBA,12,156.7
Run Code Online (Sandbox Code Playgroud)

期望输出

 NUM, NAME, ORG, File1, File2 ....
 1, AAA, 10, 123.4, 111.4
 1, AAB, 20, 176.5, NaN
 1, AAC, 30, 133.5, 122.5
 2, BBA, 12, NaN,   156.7
 .....
Run Code Online (Sandbox Code Playgroud)

这是我尝试过的:

import pandas as pd
import glob

writer = pd.ExcelWriter('analysis.xlsx', engine='xlsxwriter')
data = []
df1 = pd.read_csv("file1.cs", sep = ',', header = 'infer')    

for infile in glob.glob("*.cs"):
    df = pd.read_csv(infile, sep = ',', header = 'infer')
    name = infile[13:-7]
    df['filename'] = name
    data.append(df)
result = pd.merge(df1, data.to_frame(), on= 'NAME')
result.to_excel(writer, sheet_name=sheetname)
writer.save()
Run Code Online (Sandbox Code Playgroud)

我也尝试过,pd.concat(data, axis=1, ignore_index=False)但这不会添加 NaN,因为它只是根据列名连接文件。

Sco*_*ton 5

mergehow等于 'outer' 的参数一起使用:

df1.merge(df2, on=['NUM','NAME','ORG'], how='outer')
Run Code Online (Sandbox Code Playgroud)

输出:

   NUM NAME  ORG  DATA_x  DATA_y
0    1  AAA   10   123.4   111.4
1    1  AAB   20   176.5     NaN
2    1  AAC   30   133.5   122.5
3    2  BBA   12     NaN   156.7
Run Code Online (Sandbox Code Playgroud)

要获得确切的输出,请使用:

df1.rename(columns={'DATA':'FILE'})\
   .merge(df2.rename(columns={'DATA':'FILE'}), 
         on=['NUM','NAME','ORG'],
         how='outer', 
         suffixes=('1','2'))
Run Code Online (Sandbox Code Playgroud)

输出:

   NUM NAME  ORG  FILE1  FILE2
0    1  AAA   10  123.4  111.4
1    1  AAB   20  176.5    NaN
2    1  AAC   30  133.5  122.5
3    2  BBA   12    NaN  156.7
Run Code Online (Sandbox Code Playgroud)