迭代构建 Pandas DataFrame 的最佳方法

Question

迭代构建 Pandas DataFrame 的最佳方法

假设我有一个正在循环的算法。它将返回未知数量的结果，我想将它们全部存储在 DataFrame 中。例如：

df_results = pd.DataFrame(columns=['x', 'x_squared'])

x = 0
x_squared = 1

while x_squared < 100:
    x_squared = x ** 2

    df_iteration = pd.DataFrame(data=[[x,x_squared]], columns=['x', 'x_squared'])
    df_results = df_results.append(df_iteration, ignore_index=True)

    x += 1

print(df_results)

Run Code Online (Sandbox Code Playgroud)

输出：

     x  x_squared
0    0          0
1    1          1
2    2          4
3    3          9
4    4         16
5    5         25
6    6         36
7    7         49
8    8         64
9    9         81
10  10        100

Run Code Online (Sandbox Code Playgroud)

问题是当我想要进行大量迭代时。数学运算本身非常快。然而，当我们进行大循环时，数据帧的创建和附加变得非常慢。

我知道这个特定的例子可以很容易地解决，而无需在每次迭代中使用数据帧。但是想象一个复杂的算法，它还对数据帧等执行操作。对我来说，有时一步一步构建结果数据帧会更容易。哪种方法是最好的？

Answer 1

OD1*_*995 5

构建一个可以创建数据框的字典列表会更有效。像这样的东西：

dictList = []

x = 0
x_squared = 1

while x_squared < 100:
    x_squared = x ** 2

    dict1 = {}
    dict1['x'] = x
    dict1['x_squared'] = x_squared
    dictList.append(dict1)
    x += 1

df = pd.DataFrame(dictList)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，11 月前
查看次数：	1013 次
最近记录：	5 年，11 月前