将多个excel文件导入python pandas并将它们连接成一个数据帧

jon*_*nas 16 python excel concatenation pandas

我想从目录中读取几个excel文件到pandas并将它们连接成一个大数据帧.我虽然无法弄明白.我需要一些关于for循环的帮助并构建一个连接的数据帧:这是我到目前为止所拥有的:

import sys
import csv
import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files\excelfiles'
filenames = glob.glob(path + "/*.xlsx")

dfs = []

for df in dfs: 
    xl_file = pd.ExcelFile(filenames)
    df=xl_file.parse('Sheet1')
    dfs.concat(df, ignore_index=True)
Run Code Online (Sandbox Code Playgroud)

eri*_*mjl 43

正如评论中所提到的,您正在犯的一个错误是您正在循环一个空列表.

这是我如何做到这一点,使用一个接一个地附加5个相同的Excel文件的例子.

(1)进口:

import os
import pandas as pd
Run Code Online (Sandbox Code Playgroud)

(2)列表文件:

path = os.getcwd()
files = os.listdir(path)
files
Run Code Online (Sandbox Code Playgroud)

输出:

['.DS_Store',
 '.ipynb_checkpoints',
 '.localized',
 'Screen Shot 2013-12-28 at 7.15.45 PM.png',
 'test1 2.xls',
 'test1 3.xls',
 'test1 4.xls',
 'test1 5.xls',
 'test1.xls',
 'Untitled0.ipynb',
 'Werewolf Modelling',
 '~$Random Numbers.xlsx']
Run Code Online (Sandbox Code Playgroud)

(3)选出'xls'文件:

files_xls = [f for f in files if f[-3:] == 'xls']
files_xls
Run Code Online (Sandbox Code Playgroud)

输出:

['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']
Run Code Online (Sandbox Code Playgroud)

(4)初始化空数据帧:

df = pd.DataFrame()
Run Code Online (Sandbox Code Playgroud)

(5)循环到要附加到空数据帧的文件列表:

for f in files_xls:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)
Run Code Online (Sandbox Code Playgroud)

(6)享受您的新数据框.:-)

df
Run Code Online (Sandbox Code Playgroud)

输出:

  Result  Sample
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
Run Code Online (Sandbox Code Playgroud)

  • 这当然没问题,但我认为几乎相同的问题 http://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-python-pandas-and-concatenate-into-one-dataframe 中的方法附加到列表,然后 `pd.concat(the_list)` 更干净。 (2认同)

zou*_*ump 6

There is an even neater way to do that.

# import libraries
import glob
import pandas as pd

# get the absolute path of all Excel files 
all_excel_files = glob.glob("/path/to/Excel/files/*.xlsx")

# read all Excel files at once
df = pd.concat(pd.read_excel(excel_file) for excel_file in all_excel_files)
Run Code Online (Sandbox Code Playgroud)


小智 5

这适用于python 2.x

在Excel文件所在的目录中

参见http://pbpython.com/excel-file-combine.html

import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()    
Run Code Online (Sandbox Code Playgroud)


rac*_*hwa 5

您可以在内部使用列表理解concat

\n
import os\nimport pandas as pd\n\npath = \'/path/to/directory/\'\nfilenames = [file for file in os.listdir(path) if file.endswith(\'.xlsx\')]\n\ndf = pd.concat([pd.read_excel(path + file) for file in filenames], ignore_index=True)\n
Run Code Online (Sandbox Code Playgroud)\n

ignore_index = True的索引将df标记为0, \xe2\x80\xa6, n - 1

\n