查找 OHLC 数据中的最小值和最大值

Ale*_*Ale 6 python time time-series python-3.x ohlc

我想找到(在Python中)OHLC数据中的局部最小值和最大值,条件是这些值之间的距离至少为+-5%。

时间状况

注意

  • 对于上涨走势(收盘>开盘),low价格出现之前high价格
  • 对于下跌走势(收盘<开盘),low价格晚于high价格

解释我想要实现的目标的最佳方法是通过图形示例:

在此输入图像描述

OHLC 数据采用以下格式:

open_time      open        high        low         close
2023-07-02  0.12800000  0.12800000  0.12090000  0.12390000
2023-07-03  0.12360000  0.13050000  0.12220000  0.12830000
2023-07-04  0.12830000  0.12830000  0.12320000  0.12410000
2023-07-05  0.12410000  0.12530000  0.11800000  0.11980000
2023-07-06  0.11990000  0.12270000  0.11470000  0.11500000
Run Code Online (Sandbox Code Playgroud)

结果应该是这样的:

date1 val1 date2 val2 <---up
date2 val2 date3 val3 <---down
date3 val3 date4 val4 <---up
date4 val4 date5 val5 <---down
.
.
.
Run Code Online (Sandbox Code Playgroud)

对于示例中的数据,结果应该是:

2023-07-02  0.1280  2023-07-02  0.1209  -5.55%
2023-07-02  0.1209  2023-07-03  0.1305  7.94%
2023-07-03  0.1305  2023-07-06  0.1147  -12.11%
Run Code Online (Sandbox Code Playgroud)

这个任务有名字吗?


附录

我添加了一个新示例,具有不同的条件(+-3%)。

这是数据:

2022-02-25  38340.4200  39699.0000  38038.4600  39237.0600
2022-02-26  39237.0700  40300.0000  38600.4600  39138.1100
2022-02-27  39138.1100  39881.7700  37027.5500  37714.4300
2022-02-28  37714.4200  44200.0000  37468.2800  43181.2700
2022-03-01  43176.4100  44968.1300  42838.6800  44434.0900
Run Code Online (Sandbox Code Playgroud)

最终结果应该是:

2022-02-25  38038   2022-02-26  40300   5.95%
2022-02-26  40300   2022-02-26  38600   -4.22%
2022-02-26  38600   2022-02-27  39881   3.32%
2022-02-27  39881   2022-02-27  37027   -7.16%
2022-02-27  37027   2022-02-28  44200   19.37%
2022-02-28  44200   2022-03-01  42838   -3.08%

Run Code Online (Sandbox Code Playgroud)

Bop*_*reH 3

这是一个简单的解决方案,将每条每日 OHLC 行分成四个(天、值)条目。然后,我们处理每个条目(顺序取决于方向),同时记录局部最小值/最大值(“峰值”),合并连续运行并跳过不重要的运动。

有两个NamedTuple:(Entry对于(天,值)对)和Movement(对于结果的每一行)。我本可以使用元组,但 NamedTuple 为每个字段提供了清晰的名称。

它也不依赖于 numpy、pandas 或任何其他库,如果与mypy这样的静态检查器一起使用,类型提示有助于在编译时捕获错误。对于纯 Python 解决方案来说,它也应该相当快,因为​​它一次性计算所有运动。

from typing import Iterator, NamedTuple

Entry = NamedTuple('Entry', [('value', float), ('date', str)])
Movement = NamedTuple('Movement', [('start', Entry), ('end', Entry), ('percentage', float)])
get_change = lambda a, b: (b.value - a.value) / a.value

def get_movements(data_str: str, min_change_percent: float = 0.05) -> Iterator[Movement]:
    """ Return all movements with changes above a threshold. """
    peaks: list[Entry] = []
    for line in data_str.strip().split('\n'):
        # Read lines from input and split into date and values.
        date, open, high, low, close = line.split()
        # Order values according to movement direction.
        values_str = [open, low, high, close] if close > open else [open, high, low, close]
        for value_str in values_str:
            entry = Entry(float(value_str), date)
            if len(peaks) >= 2 and (entry > peaks[-1]) == (peaks[-1] > peaks[-2]):
                # Continue movement of same direction by replacing last peak.
                peaks[-1] = entry
            elif not peaks or abs(get_change(peaks[-1], entry)) >= min_change_percent:
                # New peak is above minimum threshold.
                peaks.append(entry)

    # Convert every pair of remaining peaks to a `Movement`.
    for start, end in zip(peaks, peaks[1:]):
        yield Movement(start, end, percentage=get_change(start, end))
Run Code Online (Sandbox Code Playgroud)

第一个示例的用法:

data_str = """
2023-07-02  0.12800000  0.12800000  0.12090000  0.12390000
2023-07-03  0.12360000  0.13050000  0.12220000  0.12830000
2023-07-04  0.12830000  0.12830000  0.12320000  0.12410000
2023-07-05  0.12410000  0.12530000  0.11800000  0.11980000
2023-07-06  0.11990000  0.12270000  0.11470000  0.11500000
"""

for mov in get_movements(data_str, 0.05):
    print(f'{mov.start.date}  {mov.start.value:.4f}  {mov.end.date}  {mov.end.value:.4f}  {mov.percentage:.2%}')
# 2023-07-02  0.1280  2023-07-02  0.1209  -5.55%
# 2023-07-02  0.1209  2023-07-03  0.1305  7.94%
# 2023-07-03  0.1305  2023-07-06  0.1147  -12.11%
Run Code Online (Sandbox Code Playgroud)

第二个例子的用法:

data_str = """
2022-02-25  38340.4200  39699.0000  38038.4600  39237.0600
2022-02-26  39237.0700  40300.0000  38600.4600  39138.1100
2022-02-27  39138.1100  39881.7700  37027.5500  37714.4300
2022-02-28  37714.4200  44200.0000  37468.2800  43181.2700
2022-03-01  43176.4100  44968.1300  42838.6800  44434.0900
"""

for mov in get_movements(data_str, 0.03):
    print(f'{mov.start.date}  {int(mov.start.value)}  {mov.end.date}  {int(mov.end.value)}  {mov.percentage:.2%}')
# 2022-02-25  38340  2022-02-26  40300  5.11%
# 2022-02-26  40300  2022-02-26  38600  -4.22%
# 2022-02-26  38600  2022-02-27  39881  3.32%
# 2022-02-27  39881  2022-02-27  37027  -7.16%
# 2022-02-27  37027  2022-02-28  44200  19.37%
# 2022-02-28  44200  2022-03-01  42838  -3.08%
# 2022-03-01  42838  2022-03-01  44968  4.97%
Run Code Online (Sandbox Code Playgroud)

第二个示例的第一个结果与您提供的值不一致,但我不清楚为什么它从 开始38038而不是38340。所有其他值都完美匹配。