Kyl*_*ndt 11 python statistics
如果我有两个不同的时间序列数据集,有没有一种简单的方法来找到python中两个集合之间的相关性?
例如:
# [ (dateTimeObject, y, z) ... ]
x = [ (8:00am, 12, 8), (8:10am, 15, 10) .... ]
Run Code Online (Sandbox Code Playgroud)
我如何在Python中获得y和z的相关性?
Wes*_*ney 34
这里的吸收速度有点慢.pandas(http://github.com/wesm/pandas和pandas.sourceforge.net)可能是你最好的选择.我有偏见因为我写了但是:
In [7]: ts1
Out[7]:
2000-01-03 00:00:00 -0.945653010936
2000-01-04 00:00:00 0.759529904445
2000-01-05 00:00:00 0.177646448683
2000-01-06 00:00:00 0.579750822716
2000-01-07 00:00:00 -0.0752734982291
2000-01-10 00:00:00 0.138730447557
2000-01-11 00:00:00 -0.506961851495
In [8]: ts2
Out[8]:
2000-01-03 00:00:00 1.10436688823
2000-01-04 00:00:00 0.110075215713
2000-01-05 00:00:00 -0.372818939799
2000-01-06 00:00:00 -0.520443811368
2000-01-07 00:00:00 -0.455928700936
2000-01-10 00:00:00 1.49624355051
2000-01-11 00:00:00 -0.204383054598
In [9]: ts1.corr(ts2)
Out[9]: -0.34768587480980645
Run Code Online (Sandbox Code Playgroud)
值得注意的是,如果您的数据是在不同的日期集上,它将计算成对相关性.它还会自动排除NaN值!
from scipy import stats
# Y and Z are numpy arrays or lists of variables
stats.pearsonr(Y, Z)
Run Code Online (Sandbox Code Playgroud)
您可以通过协方差矩阵或相关系数来做到这一点。http://docs.scipy.org/doc/numpy/reference/ generated/numpy.cov.html和http://docs.scipy.org/doc/numpy/reference/ generated/numpy.corrcoef.html 是文档为此,前者还附带了如何使用它的示例(corrcoef 用法非常相似)。
>>> x = [ (None, 12, 8), (None, 15, 10), (None, 10, 6) ]
>>> data = numpy.array([[e[1] for e in x], [e[2] for e in x]])
>>> numpy.corrcoef(data)
array([[ 1. , 0.99339927],
[ 0.99339927, 1. ]])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
24504 次 |
| 最近记录: |