使用Pandas重新采样然后填充原始数据帧

see*_*emo 5 python pandas

我基于一周的收盘价和下周的开盘价来调查市场统计数据.要做到这一点我resample在熊猫中使用.为了给出一个例子,我在下面使用pandas DataReader.

from pandas.io.data import DataReader
Run Code Online (Sandbox Code Playgroud)

首先得到每日市场数据:

SP = DataReader("^GSPC", "yahoo") 
del SP['Adj Close'] 
del SP['Volume'] 

SP.head()

              Open       High         Low       Close
Date                
2010-01-04  1116.560059 1133.869995 1116.560059 1132.989990
2010-01-05  1132.660034 1136.630005 1129.660034 1136.520020
Run Code Online (Sandbox Code Playgroud)

现在resample到每周时间表:

ohlc_dict = {                                                                                                             
'Open':'first',                                                                                                                                                                                                         
'Close': 'last'}
w1_resamp = SP.resample('1w',how=ohlc_dict, closed='left', label='left')
Run Code Online (Sandbox Code Playgroud)

这给了我每周关闭和打开的数据.我现在根据声明突出显示上周和本周开盘之间的距离np.where.

w1_resamp['distance'] = np.where(w1_resamp['Open'] < w1_resamp['Close'].shift(),(w1_resamp["Close"].shift() - w1_resamp["Open"]),'np.nan'); 



               Close    Open        distance
Date            
2010-01-03  1144.979980 1116.560059 
2010-01-10  1136.030029 1145.959961 
2010-01-17  1091.760010 1136.030029 
2010-01-24  1073.869995 1092.400024 
2010-01-31  1066.189941 1073.890015 
2010-02-07  1075.510010 1065.510010 0.6799310000001242
2010-02-14  1109.170044 1079.130005 
2010-02-21  1104.489990 1110.000000 
2010-02-28  1138.699951 1105.359985 
2010-03-07  1149.989990 1138.400024 0.29992700000002515
2010-03-14  1159.900024 1148.530029 1.4599610000000212
Run Code Online (Sandbox Code Playgroud)

我现在想在原始数据框SP中添加一个新列,显示间隙(如突出显示w1_resamp['distance'])已关闭但不知道如何执行此操作的时间和日期......任何人都可以帮忙吗?

添加的图像根据评论中的请求显示SP数据框中的所需输出:

期望的输出

Jon*_*han 0

我不遵循您对“间隙关闭”字段的请求,但可以尝试此操作,看看是否可以将其应用于索引来获取日期计算。

仅供参考,看起来“如何”方法正在被废弃,并打印一条使用 .apply() 的警告

import pandas as pd
import numpy as np

idx = pd.date_range("2018-01-01","2018-12-31")
columns = ['open','close']
data = np.random.normal(365,2)
df = pd.DataFrame(np.random.random((len(idx),len(columns))), columns = columns,index=idx)
df['high'] = df['open']*(1+np.random.uniform(.05, .20)) #bull market...
func = {
    'open': df['open'].resample('1w').first(),
    'close': df['close'].resample('1w').last(),
    'high': df['high'].resample('1w').max()
}

df_w = pd.DataFrame(func)
df_w['oc_diff'] = df_w['open'] - df_w['close'].shift()
df_w.head(10)

                open     close      high   oc_diff
2018-01-07  0.268054  0.352703  1.186531       NaN
2018-01-14  0.340011  0.907513  1.127548 -0.012693
2018-01-21  0.764949  0.907459  0.915084 -0.142564
2018-01-28  0.346734  0.703151  1.027472 -0.560725
2018-02-04  0.231348  0.960882  0.911420 -0.471803
Run Code Online (Sandbox Code Playgroud)