use*_*171 2 python numpy time-series pandas
我是Python相关环境的初学者,我在使用时间序列数据方面遇到了问题.
以下是我的OHLC 1分钟数据.
2011-11-01,9:00:00,248.50,248.95,248.20,248.70
2011-11-01,9:01:00,248.70,249.00,248.65,248.85
2011-11-01,9:02:00,248.90,249.25,248.70,249.15
...
2011-11-01,15:03:00,250.25,250.30,250.05,250.15
2011-11-01,15:04:00,250.15,250.60,250.10,250.60
2011-11-01,15:15:00,250.55,250.55,250.55,250.55
2011-11-02,9:00:00,245.55,246.25,245.40,245.80
2011-11-02,9:01:00,245.85,246.40,245.75,246.35
2011-11-02,9:02:00,246.30,246.45,245.75,245.80
2011-11-02,9:03:00,245.75,245.85,245.30,245.35
...
Run Code Online (Sandbox Code Playgroud)
我想提取每行的最后一个"关闭"数据并转换数据格式,如下所示:
2011-11-01, 248.70, 248.85, 249.15, ... 250.15, 250.60, 250.55
2011-11-02, 245.80, 246.35, 245.80, ...
...
Run Code Online (Sandbox Code Playgroud)我想计算每个EACH DAY的最高关闭值和时间(分钟),如下所示:
2011-11-01, 10:23:03, 250.55
2011-11-02, 11:02:36, 251.00
....
Run Code Online (Sandbox Code Playgroud)任何帮助将非常感激.
先感谢您,
您可以使用pandas库.对于您的数据,您可以获得最大值:
import pandas as pd
# Read in the data and parse the first two columns as a
# date-time and set it as index
df = pd.read_csv('your_file', parse_dates=[[0,1]], index_col=0, header=None)
# get only the fifth column (close)
df = df[[5]]
# Resample to date frequency and get the max value for each day.
df.resample('D', how='max')
Run Code Online (Sandbox Code Playgroud)
如果您还想显示时间,请将它们作为列保存在DataFrame中,并传递一个函数,该函数将确定最大关闭值并返回该行:
>>> df = pd.read_csv('your_file', parse_dates=[[0,1]], index_col=0, header=None,
usecols=[0, 1, 5], names=['d', 't', 'close'])
>>> df['time'] = df.index
>>> df.resample('D', how=lambda group: group.iloc[group['close'].argmax()])
close time
d_t
2011-11-01 250.60 2011-11-01 15:04:00
2011-11-02 246.35 2011-11-02 09:01:00
Run Code Online (Sandbox Code Playgroud)
如果您不是每天的价格列表,那么每天只需要进行一次分组,并使用apply分组对象上的每个组返回所有价格的列表:
>>> df.groupby(lambda dt: dt.date()).apply(lambda group: list(group['close']))
2011-11-01 [248.7, 248.85, 249.15, 250.15, 250.6, 250.55]
2011-11-02 [245.8, 246.35, 245.8, 245.35]
Run Code Online (Sandbox Code Playgroud)
有关更多信息,请查看文档:时间序列
具体数据集的更新:
您的数据集的问题是您有几天没有任何数据,因此作为重新采样器传入的函数应该处理这些情况:
def func(group):
if len(group) == 0:
return None
return group.iloc[group['close'].argmax()]
df.resample('D', how=func).dropna()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4908 次 |
| 最近记录: |