PANDAS & glob - 无法确定 Excel 文件格式，必须手动指定引擎

Question

PANDAS & glob - 无法确定 Excel 文件格式，必须手动指定引擎

Mta*_*aly 39 python dataframe python-3.x pandas

我不确定为什么会收到此错误，尽管有时我的代码工作正常！

Excel file format cannot be determined, you must specify an engine manually.

下面是我的代码和步骤：

1- 客户 ID 列列表：

customer_id = ["ID","customer_id","consumer_number","cus_id","client_ID"]

Run Code Online (Sandbox Code Playgroud)

2-查找文件夹中所有 xlsx 文件并读取它们的代码：

l = [] #use a list and concat later, faster than append in the loop
for f in glob.glob("./*.xlsx"):
    df = pd.read_excel(f).reindex(columns=customer_id).dropna(how='all', axis=1)
    df.columns = ["ID"] # to have only one column once concat
    l.append(df)
all_data  = pd.concat(l, ignore_index=True) # concat all data

Run Code Online (Sandbox Code Playgroud)

我添加了引擎openpyxl

df = pd.read_excel(f, engine="openpyxl").reindex(columns = customer_id).dropna(how='all', axis=1)

现在我得到了一个不同的错误：

BadZipFile: File is not a zip file

Run Code Online (Sandbox Code Playgroud)

pandas版本：1.3.0 python版本：python3.9 操作系统：MacOS

有没有更好的方法从文件夹中读取所有 xlsx 文件？

Answer 1

Mta*_*aly 39

找到了。当 Excel 文件打开时（例如通过 MS Excel 打开），会在同一目录中创建隐藏的临时文件：

~$datasheet.xlsx

Run Code Online (Sandbox Code Playgroud)

因此，当我运行代码以从文件夹中读取所有文件时，会出现错误：

Excel file format cannot be determined, you must specify an engine manually.

Run Code Online (Sandbox Code Playgroud)

当所有文件都关闭并且同一目录中没有隐藏的临时文件时，代码可以完美运行。 ~$filename.xlsx

Answer 2

pir*_*bay 31

还要确保您使用正确的pd.read_*方法。我在尝试使用而不是打开.csv文件时遇到了此错误。我在这里找到了这个方便的代码片段，可以根据 Excel 文件类型自动选择正确的方法。read_excel()read_csv()

if file_extension == 'xlsx':
    df = pd.read_excel(file.read(), engine='openpyxl')
elif file_extension == 'xls':
    df = pd.read_excel(file.read())
elif file_extension == 'csv':
    df = pd.read_csv(file.read())

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，1 月前
查看次数：	218123 次
最近记录：	1 年，10 月前

PANDAS &amp; glob - 无法确定 Excel 文件格式，必须手动指定引擎

PANDAS & glob - 无法确定 Excel 文件格式，必须手动指定引擎