使用OLS回归预测未来值(Python,StatsModels,Pandas)

use*_*157 2 python pandas statsmodels

我目前正在尝试用Python实现MLR,并且我不确定如何将我发现的系数应用于未来的值.

import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2

TV = [230.1, 44.5, 17.2, 151.5, 180.8]
Radio = [37.8,39.3,45.9,41.3,10.8]
Newspaper = [69.2,45.1,69.3,58.5,58.4]
Sales = [22.1, 10.4, 9.3, 18.5,12.9]
df = pd.DataFrame({'TV': TV, 
                   'Radio': Radio, 
                   'Newspaper': Newspaper, 
                   'Sales': Sales})

Y = df.Sales
X = df[['TV','Radio','Newspaper']]
X = sm2.add_constant(X)
model = sm.OLS(Y, X).fit()
>>> model.params
const       -0.141990
TV           0.070544
Radio        0.239617
Newspaper   -0.040178
dtype: float64
Run Code Online (Sandbox Code Playgroud)

所以,假设我想预测以下DataFrame的"sales":

EDIT

TV     Radio    Newspaper    Sales
230.1  37,8       69.2       22.4
44.5   39.3       45.1       10.1
...    ...        ...        ...
25      15        15
30      20        22
35      22        36
Run Code Online (Sandbox Code Playgroud)

我一直在尝试一种我在这里找到的方法,但我似乎无法让它工作:使用Pandas OLS进行预测

谢谢!

Ale*_*der 7

假设df2是您的新样本DataFrame:

model = sm.OLS(Y, X).fit()
new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values
new_x = sm2.add_constant(new_x)  # sm2 = statsmodels.api
y_predict = model.predict(new_x)

>>> y_predict
array([ 4.61319034,  5.88274588,  6.15220225])
Run Code Online (Sandbox Code Playgroud)

您可以将结果直接分配给df2,如下所示:

df2.loc[:, 'Sales'] = model.predict(new_x)
Run Code Online (Sandbox Code Playgroud)

要使用回归中的预测从原始DataFrame中填充缺少的Sales值,请尝试:

X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']]
X = sm2.add_constant(X)
Y = df[df.Sales.notnull()].Sales

model = sm.OLS(Y, X).fit()
new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']]
new_x = sm2.add_constant(new_x)  # sm2 = statsmodels.api

df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)
Run Code Online (Sandbox Code Playgroud)