在 Pandas 数据框上使用 polyfit，然后将结果添加到新列

Question

在 Pandas 数据框上使用 polyfit，然后将结果添加到新列

J.C*_*alc 1 python numpy linear-regression dataframe pandas

我有一个这样的数据框。对于每个 Id，我有 (x1,x2), (y1,y2)。我想将这些提供给 polyfit()，获取斜率和 x 截距并将它们添加为新列。

    Id        x         y
    1     0.79978   0.018255
    1     1.19983   0.020963
    2     2.39998   0.029006
    2     2.79995   0.033004
    3     1.79965   0.021489
    3     2.19969   0.024194
    4     1.19981   0.019338
    4     1.59981   0.022200
    5     1.79971   0.025629
    5     2.19974   0.028681

Run Code Online (Sandbox Code Playgroud)

我真的需要帮助对正确的行进行分组并将它们提供给 polyfit。我一直在努力解决这个问题。任何帮助将是最受欢迎的。

Answer 1

ALo*_*llz 6

您可以groupby并在每个组中应用拟合。首先，设置索引，以便以后避免合并。

import pandas as pd
import numpy as np

df = df.set_index('Id')
df['fit'] = df.groupby('Id').apply(lambda x: np.polyfit(x.x, x.y, 1))

Run Code Online (Sandbox Code Playgroud)

df 就是现在：

          x         y                                           fit
Id                                                                 
1   0.79978  0.018255  [0.0067691538557680215, 0.01284116612923385]
1   1.19983  0.020963  [0.0067691538557680215, 0.01284116612923385]
2   2.39998  0.029006   [0.00999574968122608, 0.005016400680051043]
2   2.79995  0.033004   [0.00999574968122608, 0.005016400680051043]
3   1.79965  0.021489  [0.006761823817618233, 0.009320083766623343]
3   2.19969  0.024194  [0.006761823817618233, 0.009320083766623343]
...

Run Code Online (Sandbox Code Playgroud)

如果您想为每个部分分别设置单独的列，您可以应用 pd.Series。

df[['slope', 'intercept']] = df.fit.apply(pd.Series)
df = df.drop(columns='fit')

Run Code Online (Sandbox Code Playgroud)

或者从最初的 DataFrame 坚持一个apply并连接结果。

# From initial DataFrame
df = df.set_index('Id')
res = df.groupby('Id').apply(lambda x: pd.Series(np.polyfit(x.x, x.y, 1), 
                                                 index=['slope', 'intercept']))
df = pd.concat([df, res], axis=1)

Run Code Online (Sandbox Code Playgroud)

df 就是现在：

          x         y     slope  intercept
Id                                        
1   0.79978  0.018255  0.006769   0.012841
1   1.19983  0.020963  0.006769   0.012841
2   2.39998  0.029006  0.009996   0.005016
2   2.79995  0.033004  0.009996   0.005016
3   1.79965  0.021489  0.006762   0.009320
3   2.19969  0.024194  0.006762   0.009320
4   1.19981  0.019338  0.007155   0.010753
4   1.59981  0.022200  0.007155   0.010753
5   1.79971  0.025629  0.007629   0.011898
5   2.19974  0.028681  0.007629   0.011898

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，10 月前
查看次数：	7017 次
最近记录：	5 年，2 月前