摘要不适用于OLS估算

Nik*_*eke 6 python finance linear-regression statsmodels

我的statsmodels OLS估算有问题。该模型运行时没有任何问题,但是当我尝试调用摘要时,我可以看到实际结果,当a和权重的形状不同时,我需要指定轴的TypeError。

我的代码如下所示:

from __future__ import print_function, division 
import xlrd as xl
import numpy as np
import scipy as sp
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm

file_loc = "/Users/NiklasLindeke/Python/dataset_3.xlsx"
workbook = xl.open_workbook(file_loc)
sheet = workbook.sheet_by_index(0)
tot = sheet.nrows

data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]

rv1 = []
rv5 = []
rv22 = []
rv1fcast = []
T = []
price = []
time = []
retnor = []

model = []

for i in range(1, tot):        
    t = data[i][0]
    ret = data[i][1]
    ret5 = data[i][2]
    ret22 = data[i][3]
    ret1_1 = data[i][4]
    retn = data[i][5]
    t = xl.xldate_as_tuple(t, 0)
    rv1.append(ret)
    rv5.append(ret5)
    rv22.append(ret22)
    rv1fcast.append(ret1_1)
    retnor.append(retn)
    T.append(t)


df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,})
df = df[df.RVFCAST != ""]

Model = smf.ols(formula='RVFCAST ~ RV1 + RV5 + RV22', data = df).fit()
print Model.summary()
Run Code Online (Sandbox Code Playgroud)

换句话说,这不起作用。

回调如下:

print Model.summary()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-394-ea8ea5139fd4> in <module>()
----> 1 print Model.summary()

/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in summary(self, yname, xname, title, alpha)
   1948             top_left.append(('Covariance Type:', [self.cov_type]))
   1949 
-> 1950         top_right = [('R-squared:', ["%#8.3f" % self.rsquared]),
   1951                      ('Adj. R-squared:', ["%#8.3f" % self.rsquared_adj]),
   1952                      ('F-statistic:', ["%#8.4g" % self.fvalue] ),

/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/tools/decorators.pyc in __get__(self, obj, type)
     92         if _cachedval is None:
     93             # Call the "fget" function
---> 94             _cachedval = self.fget(obj)
     95             # Set the attribute in obj
     96 #            print("Setting %s in cache to %s" % (name, _cachedval))

/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in rsquared(self)
   1179     def rsquared(self):
   1180         if self.k_constant:
-> 1181             return 1 - self.ssr/self.centered_tss
   1182         else:
   1183             return 1 - self.ssr/self.uncentered_tss

/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/tools/decorators.pyc in __get__(self, obj, type)
     92         if _cachedval is None:
     93             # Call the "fget" function
---> 94             _cachedval = self.fget(obj)
     95             # Set the attribute in obj
     96 #            print("Setting %s in cache to %s" % (name, _cachedval))

/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in centered_tss(self)
   1159         if weights is not None:
   1160             return np.sum(weights*(model.endog - np.average(model.endog,
-> 1161                                                         weights=weights))**2)
   1162         else:  # this is probably broken for GLS
   1163             centered_endog = model.wendog - model.wendog.mean()

/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/lib/function_base.pyc in average(a, axis, weights, returned)
    522             if axis is None:
    523                 raise TypeError(
--> 524                     "Axis must be specified when shapes of a and weights "
    525                     "differ.")
    526             if wgt.ndim != 1:

TypeError: Axis must be specified when shapes of a and weights differ.
Run Code Online (Sandbox Code Playgroud)

很抱歉,但是我不知道该怎么办。我还希望在此之后,使用一些Newey-West方法对自相关进行校正,我发现您可以使用以下代码行:

mdl = Model.get_robustcov_results(cov_type='HAC',maxlags=1)
Run Code Online (Sandbox Code Playgroud)

但是,当我尝试用模型运行它时,会返回错误:

ValueError: operands could not be broadcast together with shapes (256,766) (256,1,256) 
Run Code Online (Sandbox Code Playgroud)

但是我意识到statsmodels.formula与get_robustcov函数不兼容,但是如果是这样,那我如何测试自相关呢?

但是我最紧迫的问题是我无法为自己的OLS生成摘要。

根据要求,这是df中我的数据集的前三十行。

print df
             RV1          RV22           RV5      RVFCAST
0     0.01553801    0.01309511    0.01081393  0.008421236
1    0.008881671    0.01301336    0.01134905   0.01553801
2     0.01042178    0.01326669    0.01189979  0.008881671
3    0.009809431    0.01334593    0.01170942   0.01042178
4    0.009418737    0.01358808    0.01152253  0.009809431
5     0.01821364    0.01362502    0.01269661  0.009418737
6     0.01163536    0.01331585    0.01147541   0.01821364
7    0.009469907    0.01329509    0.01172988   0.01163536
8    0.008875018    0.01361841    0.01202432  0.009469907
9     0.01528914    0.01430873    0.01233219  0.008875018
10    0.01210761    0.01412724    0.01238776   0.01528914
11    0.01290773     0.0144439    0.01432174   0.01210761
12    0.01094212    0.01425895    0.01493865   0.01290773
13    0.01041433    0.01430177     0.0156763   0.01094212
14    0.01556703     0.0142857    0.01986616   0.01041433
15     0.0217775    0.01430253    0.01864532   0.01556703
16    0.01599228    0.01390088    0.01579069    0.0217775
17    0.01463037    0.01384096    0.01416622   0.01599228
18    0.03136361    0.01395866    0.01398807   0.01463037
19   0.009462822    0.01295695     0.0106063   0.03136361
20   0.007504367    0.01295204    0.01114677  0.009462822
21   0.007869922    0.01300863    0.01267322  0.007504367
22    0.01373964     0.0129547    0.01314553  0.007869922
23    0.01445476    0.01271198       0.01268   0.01373964
24    0.01216517    0.01249902    0.01202476   0.01445476
25     0.0151366    0.01266783     0.0129083   0.01216517
26    0.01023149    0.01258627     0.0146934    0.0151366
27    0.01141199    0.01284094    0.01490637   0.01023149
28    0.01117856    0.01321258    0.01643881   0.01141199
29    0.01658287    0.01340074    0.01597086   0.01117856
Run Code Online (Sandbox Code Playgroud)

Nik*_*eke 7

我要感谢 user333800 的所有帮助!

如果有人遇到同样的问题,以供将来参考。

以下代码:

df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,})
df = df[df.RVFCAST != ""]
df = df.astype(float)

Model = smf.ols(formula='RVFCAST ~ RV1 + RV5 + RV22', data = df).fit()
mdl = Model.get_robustcov_results(cov_type='HAC',maxlags=1)
Run Code Online (Sandbox Code Playgroud)

给我:

print mdl.summary()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                RVFCAST   R-squared:                       0.681
Model:                            OLS   Adj. R-squared:                  0.677
Method:                 Least Squares   F-statistic:                     120.9
Date:                Wed, 22 Apr 2015   Prob (F-statistic):           1.60e-48
Time:                        17:19:19   Log-Likelihood:                 1159.8
No. Observations:                 256   AIC:                            -2312.
Df Residuals:                     252   BIC:                            -2297.
Df Model:                           3                                         
Covariance Type:                  HAC                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      0.0005      0.000      2.285      0.023      7.24e-05     0.001
RV1            0.2823      0.104      2.710      0.007         0.077     0.487
RV5           -0.0486      0.193     -0.252      0.802        -0.429     0.332
RV22           0.7450      0.232      3.212      0.001         0.288     1.202
==============================================================================
Omnibus:                      174.186   Durbin-Watson:                   2.045
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2152.634
Skew:                           2.546   Prob(JB):                         0.00
Kurtosis:                      16.262   Cond. No.                     1.19e+03
==============================================================================
Run Code Online (Sandbox Code Playgroud)

我现在可以继续我的论文了 :)