Sklearn 或 Pandas，用简单的线性回归估算缺失值

Question

Sklearn 或 Pandas，用简单的线性回归估算缺失值

Avi*_*vic 2 python data-mining pandas scikit-learn

我有数据、时间序列数据，我想估算缺失的数据。我不能使用列的平均值，因为我认为它不适合时间序列数据。所以我想要简单的线性回归来估算它

Day, Price
 1 , NaN
 2, NaN
 3, 1800
 4, 1900
 5, NaN
 6, NaN
 7, 2000
 8, 2200

Run Code Online (Sandbox Code Playgroud)

这该怎么做？

我更喜欢使用 Pandas 来做到这一点，但如果没有其他方法，我可以使用 sklearn 来做到这一点:)

Answer 1

rje*_*rje 5

您可以使用interpolate以下方法执行此操作：

df['Price'].interpolate(method='linear', inplace=True)

Run Code Online (Sandbox Code Playgroud)

结果：

    Price   Date
0   NaN     1
1   NaN     2
2   1800.000000     3
3   1900.000000     4
4   1933.333333     5
5   1966.666667     6
6   2000.000000     7
7   2200.000000     8

Run Code Online (Sandbox Code Playgroud)

如您所见，这只会向前填充缺失值。如果您还想填充前两个值，请使用参数limit_direction="both"：

df['Price'].interpolate(method='linear', inplace=True, limit_direction="both")

Run Code Online (Sandbox Code Playgroud)

有不同的插值方法，例如二次或样条，有关更多信息，请参阅文档：https : //pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.interpolate.html

归档时间：	7 年，2 月前
查看次数：	4347 次
最近记录：	7 年，2 月前