如何在滚动窗口中应用 Python 中的 Hurst 指数

Mar*_*ale 4 python trading algorithmic-trading quantitative-finance

我试图在滚动窗口上对 SPY 收盘价应用赫斯特指数。如果我将以下代码(我从这里获得:https : //www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing)应用于收盘价列,则效果很好。然而,这给了我一个静态值。考虑到最近 200 个收盘价,我想在滚动窗口上应用赫斯特指数。我的目标是获得一列,其中考虑到最近 200 个收盘价,在每一行中更新 Hurst 指数。

from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn
import pandas_datareader as dr
from datetime import date

df = dr.data.get_data_yahoo('SPY',start='23-01-1991',end=date.today())

def hurst(ts):
    """Returns the Hurst Exponent of the time series vector ts"""
    # Create the range of lag values
    lags = range(2, 100)

    # Calculate the array of the variances of the lagged differences
    tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]

    # Use a linear fit to estimate the Hurst Exponent
    poly = polyfit(log(lags), log(tau), 1)

    # Return the Hurst exponent from the polyfit output
    return poly[0]*2.0

print ("Hurst(SPY): %s" % hurst(df['Close']))

## I've tried the next lines of code but unfortunately they are not working:
df['Hurst_Column']= [0]
for aRowINDEX in range( 1, 200 ):
    df['Hurst_Column'][-aRowINDEX] = hurst (df[u'Close'][:-aRowINDEX]) 
Run Code Online (Sandbox Code Playgroud)

我对 Python 非常陌生,我尝试过不同的东西,但都没有运气。任何人都可以帮助我吗?任何帮助都会非常受欢迎。谢谢!

use*_*197 5

让我为您提供两步前进的方法:

第 1 步:使用测试数据实现更健壮的 Hurst 指数

第 2 步:生成“滑动窗口”类似计算的简单方法

第 3 步:有点复杂的方式 - 如果滚动窗口是必须的.​​.....

奖励:我应该在问题的代码下写什么来完成它?


第 1 步:使用测试数据进行更健壮的 Hurst 指数实现:

在这里,我将按QuantFX原样发布一个从模块中获取的函数实现(Py2.7 在大多数地方不会出问题,但在 Py3.x 中xrange()应该用 替换range())。

此代码包含一些改进和某种自我修复,如果测试表明,数据段存在问题(QuantFX使用时间自然流的约定,其中data[0]“最旧”的时间序列单元data[-1]是“最近的”一个)。

HurstEXP()不带任何参数调用将产生一个演示运行,显示主题的一些测试和解释。

也是print( HurstEXP.__doc__ )不言自明的:

def HurstEXP( ts = [ None, ] ):                                         # TESTED: HurstEXP()                Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
            """                                                         __doc__
            USAGE:
                        HurstEXP( ts = [ None, ] )

                        Returns the Hurst Exponent of the time series vector ts[]

            PARAMETERS:
                        ts[,]   a time-series, with 100+ elements
                                ( or [ None, ] that produces a demo run )

            RETURNS:
                        float - a Hurst Exponent approximation,
                                as a real value
                                or
                                an explanatory string on an empty call
            THROWS:
                        n/a
            EXAMPLE:
                        >>> HurstEXP()                                        # actual numbers will vary, as per np.random.randn() generator used
                        HurstEXP( Geometric Browian Motion ):    0.49447454
                        HurstEXP(    Mean-Reverting Series ):   -0.00016013
                        HurstEXP(          Trending Series ):    0.95748937
                        'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'

                        >>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
            REF.s:
                        >>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
            """
            #---------------------------------------------------------------------------------------------------------------------------<self-reflective>
            if ( ts[0] == None ):                                       # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:

                 gbm = np.log( 1000 + np.cumsum(     np.random.randn( 100000 ) ) )  # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log(  1000 + rand + ... )]
                 mr  = np.log( 1000 +                np.random.randn( 100000 )   )  # a Mean-Reverting Series    [log(1000 + rand), log(1000 + rand        ), log(1000 + rand               ),... log(  1000 + rand       )]
                 tr  = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) )  # a Trending Series          [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]

                                                                        # Output the Hurst Exponent for each of the above SYNTH series
                 print ( "HurstEXP( Geometric Browian Motion ):   {0: > 12.8f}".format( HurstEXP( gbm ) ) )
                 print ( "HurstEXP(    Mean-Reverting Series ):   {0: > 12.8f}".format( HurstEXP( mr  ) ) )
                 print ( "HurstEXP(          Trending Series ):   {0: > 12.8f}".format( HurstEXP( tr  ) ) )

                 return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
            """                                                         # FIX:
            ===================================================================================================================
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
            0.47537688039105963
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
            -0.31081076640420308
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
            nan
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
            warnings.warn(msg, RankWarning)
            0.026867491053098096
            """
            pass;     too_short_list = 101 - len( ts )                  # MUST HAVE 101+ ELEMENTS
            if ( 0 <  too_short_list ):                                 # IF NOT:
                 ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
            #---------------------------------------------------------------------------------------------------------------------------
            lags = range( 2, 100 )                                                              # Create the range of lag values
            tau  = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ]  # Calculate the array of the variances of the lagged differences
            #oly = np.polyfit( np.log( lags ), np.log( tau ), 1 )                               # Use a linear fit to estimate the Hurst Exponent
            #eturn ( 2.0 * poly[0] )                                                            # Return the Hurst exponent from the polyfit output
            """ ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
              warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
              arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
              ret, rcount, out=ret, casting='unsafe', subok=False)
            QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
              return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
              warnings.warn(msg, RankWarning)
            0.028471879418359915
            |
            |
            |# DATA:
            |
            |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
            memmap([ 1763.31005859,  1765.01000977,  1765.44995117,  1764.80004883,
                     1765.83996582,  1768.91003418,  1771.04003906,  1769.43994141,
                     1771.4699707 ,  1771.61999512,  1774.76000977,  1769.55004883,
                     1773.4699707 ,  1773.32995605,  1770.08996582,  1770.20996094,
                     1768.34997559,  1768.02001953,  1767.59997559,  1767.23999023,
                     1768.41003418,  1769.06994629,  1769.56994629,  1770.7800293 ,
                     1770.56994629,  1769.7800293 ,  1769.90002441,  1770.44995117,
                     1770.9699707 ,  1771.04003906,  1771.16003418,  1769.81005859,
                     1768.76000977,  1769.39001465,  1773.23999023,  1771.91003418,
                     1766.92004395,  1765.56994629,  1762.65002441,  1760.18005371,
                     1755.        ,  1756.67004395,  1753.48999023,  1753.7199707 ,
                     1751.92004395,  1745.44995117,  1745.44995117,  1744.54003906,
                     1744.54003906,  1744.84997559,  1744.84997559,  1744.34997559,
                     1744.34997559,  1743.75      ,  1743.75      ,  1745.23999023,
                     1745.23999023,  1745.15002441,  1745.31005859,  1745.47998047,
                     1745.47998047,  1749.06994629,  1749.06994629,  1748.29003906,
                     1748.29003906,  1747.42004395,  1747.42004395,  1746.98999023,
                     1747.61999512,  1748.79003906,  1748.79003906,  1748.38000488,
                     1748.38000488,  1744.81005859,  1744.81005859,  1736.80004883,
                     1736.80004883,  1735.43005371,  1735.43005371,  1737.9699707
                     ], dtype=float32
                    )
            |
            |
            | # CONVERTED .tolist() to avoid .memmap-type artifacts:
            |
            |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
            [1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
            859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
              ret = ret.dtype.type(ret / rcount)

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
              warnings.warn(msg, RankWarning)
            0.028471876494884543
            ===================================================================================================================
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
            0.47537688039105963
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
            -0.31081076640420308
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
            nan
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
            warnings.warn(msg, RankWarning)
            0.026867491053098096
            """
            return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )
Run Code Online (Sandbox Code Playgroud)

第 2 步:生成“滑动窗口”计算的简单方法:

 [ ( -i, HurstEXP( ts = df['Close'][:-i] ) ) for i in range( 1, 200 ) ] # should call the HurstEXP for the last 200 days
Run Code Online (Sandbox Code Playgroud)

考验我:

>>> df[u'Close']
Date
1993-01-29     43.937500
1993-02-01     44.250000
...
2019-07-17    297.739990
2019-07-18    297.429993
Name: Close, Length: 6665, dtype: float64
>>> 

>>> [ (                          -i,
         HurstEXP( df[u'Close'][:-i] )
         )                   for  i in range( 1, 10 )
         ]
[ ( -1, 0.4489364467179827  ),
  ( -2, 0.4489306967683502  ),
  ( -3, 0.44892205577752986 ),
  ( -4, 0.448931424819551   ),
  ( -5, 0.44895272101162326 ),
  ( -6, 0.44896713741862954 ),
  ( -7, 0.44898211557287204 ),
  ( -8, 0.4489941656580211  ),
  ( -9, 0.4490116318052649  )
  ]
Run Code Online (Sandbox Code Playgroud)

第 3 步:有点复杂的方式 - 如果滚动窗口是必须的.​​.....:

虽然内存/处理效率不高,但“滚动窗口”技巧可能会被注入到游戏中,而没有内存,处理效率的好处越少(你在语法上合理的代码上花费了很多,但处理效率这样做HurstEXP()没有任何好处,因为 的卷积性质无济于事,如果不尝试重新矢量化其内部代码(为什么以及永远是什么?)老板还是要你这么做……):

def rolling_window( aMatrix, aRollingWindowLENGTH ):                    #
            """                                                                 __doc__
            USAGE:   rolling_window( aMatrix, aRollingWindowLENGTH )

            PARAMS:  aMatrix                a numpy array
                     aRollingWindowLENGTH   a LENGTH of a rolling window

            RETURNS: a stride_trick'ed numpy array with rolling windows

            THROWS:  n/a

            EXAMPLE: >>> x = np.arange( 10 ).reshape( ( 2, 5 ) )

                     >>> rolling_window( x, 3 )
                     array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
                            [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

                     >>> np.mean( rolling_window( x, 3 ), -1 )
                     array([[ 1.,  2.,  3.],
                            [ 6.,  7.,  8.]])
            """
            new_shape   = aMatrix.shape[:-1] + ( aMatrix.shape[-1] - aRollingWindowLENGTH + 1, aRollingWindowLENGTH )
            new_strides = aMatrix.strides    + ( aMatrix.strides[-1], )
            return np.lib.stride_tricks.as_strided( aMatrix,
                                                    shape   = new_shape,
                                                    strides = new_strides
                                                    )
Run Code Online (Sandbox Code Playgroud)
>>> rolling_window( df[u'Close'], 100 ).shape
(6566, 100)

>>> rolling_window( df[u'Close'], 100 ).flags
    C_CONTIGUOUS    : False
    F_CONTIGUOUS    : False
    OWNDATA         : False <---------------- a VIEW, not a replica
    WRITEABLE       : True
    ALIGNED         : True
    WRITEBACKIFCOPY : False
    UPDATEIFCOPY    : False
Run Code Online (Sandbox Code Playgroud)

您将获得 6566 个向量的数组,其中包含“rolling_window”-ed 100 天的 SPY[Close]-s 块

>>> rolling_window( df[u'Close'], 100 )
array([[ 43.9375    ,  44.25      ,  44.34375   , ...,  44.5       ,  44.59375   ,  44.625     ],
       [ 44.25      ,  44.34375   ,  44.8125    , ...,  44.59375   ,  44.625     ,  44.21875   ],
       [ 44.34375   ,  44.8125    ,  45.        , ...,  44.625     ,  44.21875   ,  44.8125    ],
       ...,
       [279.14001465, 279.51998901, 279.32000732, ..., 300.6499939 , 300.75      , 299.77999878],
       [279.51998901, 279.32000732, 279.20001221, ..., 300.75      , 299.77999878, 297.73999023],
       [279.32000732, 279.20001221, 278.67999268, ..., 299.77999878, 297.73999023, 297.42999268]])
Run Code Online (Sandbox Code Playgroud)

问:我应该在我的问题代码下写什么来完成它?

for                         aRowINDEX in range( 1, 200 ):
    df[u'HurstEXP_COLUMN'][-aRowINDEX] = HurstEXP( df[u'Close'][:-aRowINDEX] )
    print( "[{0:>4d}]: DIFF( hurst() - HurstEXP() ) == {1:}".format( aRowINDEX,
                           ( hurst(    df[u'Close'][:-aRowINDEX] )
                           - HurstEXP( df[u'Close'][:-aRowINDEX] )
                             )
            )
Run Code Online (Sandbox Code Playgroud)