我正在尝试使用以下代码合并两个 excel 文件并遇到错误 ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.
import pandas as pd
file1 = pd.read_excel("file1.xlsx")
file2 = pd.read_excel("file2.xlsx")
file3 = file1.merge(file2, on="Input E-mail", how="outer")
file3.to_excel("merged1.xlsx")
Run Code Online (Sandbox Code Playgroud)
文件大小为 ~100MB+~100MB,可用内存为 9GB(16GB)
您生成的数据框可能比您的两个输入数据框大得多。简单的例子:
import pandas as pd
values = pd.DataFrame({"id": [1,1,1,1], "value": ["a", "b", "c", "d"]})
users = pd.DataFrame({"id": [1,1,1], "users": ["Amy", "Bob", "Dan"]})
big_table = pd.merge(users, values, how="outer")
print big_table
Run Code Online (Sandbox Code Playgroud)
结果:
id users value
0 1 Amy a
1 1 Amy b
2 1 Amy c
3 1 Amy d
4 1 Bob a
5 1 Bob b
6 1 Bob c
7 1 Bob d
8 1 Dan a
9 1 Dan b
10 1 Dan c
11 1 Dan d
Run Code Online (Sandbox Code Playgroud)