EA0*_*A00 5 python merge pandas
我将目录中的所有文件连接在一起,但是有些文件具有不同数量的条目 - 当文件中没有该键的值时,如何放置 NaN?
例如:
文件1.cs
NUM, NAME, ORG, DATA
1,AAA,10,123.4
1,AAB,20,176.5
1,AAC,30,133.5
Run Code Online (Sandbox Code Playgroud)
文件 2. CS
NUM, NAME, ORG, DATA
1,AAA,10,111.4
1,AAC,30,122.5
2,BBA,12,156.7
Run Code Online (Sandbox Code Playgroud)
期望输出
NUM, NAME, ORG, File1, File2 ....
1, AAA, 10, 123.4, 111.4
1, AAB, 20, 176.5, NaN
1, AAC, 30, 133.5, 122.5
2, BBA, 12, NaN, 156.7
.....
Run Code Online (Sandbox Code Playgroud)
这是我尝试过的:
import pandas as pd
import glob
writer = pd.ExcelWriter('analysis.xlsx', engine='xlsxwriter')
data = []
df1 = pd.read_csv("file1.cs", sep = ',', header = 'infer')
for infile in glob.glob("*.cs"):
df = pd.read_csv(infile, sep = ',', header = 'infer')
name = infile[13:-7]
df['filename'] = name
data.append(df)
result = pd.merge(df1, data.to_frame(), on= 'NAME')
result.to_excel(writer, sheet_name=sheetname)
writer.save()
Run Code Online (Sandbox Code Playgroud)
我也尝试过,pd.concat(data, axis=1, ignore_index=False)但这不会添加 NaN,因为它只是根据列名连接文件。
merge与how等于 'outer' 的参数一起使用:
df1.merge(df2, on=['NUM','NAME','ORG'], how='outer')
Run Code Online (Sandbox Code Playgroud)
输出:
NUM NAME ORG DATA_x DATA_y
0 1 AAA 10 123.4 111.4
1 1 AAB 20 176.5 NaN
2 1 AAC 30 133.5 122.5
3 2 BBA 12 NaN 156.7
Run Code Online (Sandbox Code Playgroud)
要获得确切的输出,请使用:
df1.rename(columns={'DATA':'FILE'})\
.merge(df2.rename(columns={'DATA':'FILE'}),
on=['NUM','NAME','ORG'],
how='outer',
suffixes=('1','2'))
Run Code Online (Sandbox Code Playgroud)
输出:
NUM NAME ORG FILE1 FILE2
0 1 AAA 10 123.4 111.4
1 1 AAB 20 176.5 NaN
2 1 AAC 30 133.5 122.5
3 2 BBA 12 NaN 156.7
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1095 次 |
| 最近记录: |