use*_*157 2 python pandas statsmodels
我目前正在尝试用Python实现MLR,并且我不确定如何将我发现的系数应用于未来的值.
import pandas as pd
import statsmodels.formula.api as sm
import statsmodels.api as sm2
TV = [230.1, 44.5, 17.2, 151.5, 180.8]
Radio = [37.8,39.3,45.9,41.3,10.8]
Newspaper = [69.2,45.1,69.3,58.5,58.4]
Sales = [22.1, 10.4, 9.3, 18.5,12.9]
df = pd.DataFrame({'TV': TV,
'Radio': Radio,
'Newspaper': Newspaper,
'Sales': Sales})
Y = df.Sales
X = df[['TV','Radio','Newspaper']]
X = sm2.add_constant(X)
model = sm.OLS(Y, X).fit()
>>> model.params
const -0.141990
TV 0.070544
Radio 0.239617
Newspaper -0.040178
dtype: float64
Run Code Online (Sandbox Code Playgroud)
所以,假设我想预测以下DataFrame的"sales":
EDIT
TV Radio Newspaper Sales
230.1 37,8 69.2 22.4
44.5 39.3 45.1 10.1
... ... ... ...
25 15 15
30 20 22
35 22 36
Run Code Online (Sandbox Code Playgroud)
我一直在尝试一种我在这里找到的方法,但我似乎无法让它工作:使用Pandas OLS进行预测
谢谢!
假设df2是您的新样本DataFrame:
model = sm.OLS(Y, X).fit()
new_x = df2.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']].values
new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api
y_predict = model.predict(new_x)
>>> y_predict
array([ 4.61319034, 5.88274588, 6.15220225])
Run Code Online (Sandbox Code Playgroud)
您可以将结果直接分配给df2,如下所示:
df2.loc[:, 'Sales'] = model.predict(new_x)
Run Code Online (Sandbox Code Playgroud)
要使用回归中的预测从原始DataFrame中填充缺少的Sales值,请尝试:
X = df.loc[df.Sales.notnull(), ['TV', 'Radio', 'Newspaper']]
X = sm2.add_constant(X)
Y = df[df.Sales.notnull()].Sales
model = sm.OLS(Y, X).fit()
new_x = df.loc[df.Sales.isnull(), ['TV', 'Radio', 'Newspaper']]
new_x = sm2.add_constant(new_x) # sm2 = statsmodels.api
df.loc[df.Sales.isnull(), 'Sales'] = model.predict(new_x)
Run Code Online (Sandbox Code Playgroud)