我需要处理一系列毫秒时间帧的历史刻度数据.需要能够过滤某些时间盘(小时,分钟等)的开口刻度.该序列可能具有比跨度更大的间隙,因此必须将这种间隙之后的第一个刻度选为开口,否则开口刻度是最接近相应时间跨度的日历开始的那个.
我想到的第一件事是以下有状态过滤函数opensTimespan:Timespan->(Timestamp->bool),它将每个间隙开放或间隔开放的时间间隔捕获到一个闭包中,以便在调用之间传递:
let opensTimespan (interval: Timespan)=
let lastTakenId = ref -1L // Timestamps are positive
fun (tickAt: Timestamp) ->
let tickId = tickAt / interval in
if tickId <> !lastTakenId then lastTakenId := tickId; true
else false
Run Code Online (Sandbox Code Playgroud)
并可以这样应用:
let hourlyTicks = readTicks @"EURUSD-history.zip" "EURUSD-2012-04.csv"
|> Seq.filter (opensTimespan HOUR) |> Seq.toList
Run Code Online (Sandbox Code Playgroud)
这很好,但opensTimespan副作用绝对不是惯用的.
一种替代方案可能是使用以下事实:对于勾选的决定是打开一个或不需要只需要自己和前一个时间戳的一对来提出以下无状态过滤功能opensTimespanF:Timespan->Timestamp*Timestamp->bool:
let opensTimespanF interval (ticksPair: Timestamp*Timestamp) =
fst ticksPair/ interval <> snd ticksPair/ interval
Run Code Online (Sandbox Code Playgroud)
可以应用为:
let hourlyTicks=
seq {
yield 0L;
yield! readTicks @"EURUSD-history.zip" "EURUSD-2012-04.csv"
}
|> Seq.pairwise |> Seq.filter (opensTimespanF HOUR)
|> Seq.map snd
|> Seq.toList
Run Code Online (Sandbox Code Playgroud)
这种方法纯粹的功能产生了相同的结果,只有轻微(~11%)的性能损失.
以纯粹的功能方式接近这项任务的其他方式我可能会失踪?
谢谢.
一个纯粹的功能解决方案是使用该fold功能.该fold函数用于处理序列(或列表)并累积某些状态.在您的示例中,状态是lastTakenId您要返回的元素列表,因此您可以使用类型的状态Timestamp * (Timestamp list):
let hourlyTicks =
readTicks @"EURUSD-history.zip" "EURUSD-2012-04.csv"
|> Seq.fold (fun (lastTakenId, res) tickAt ->
// Similar to the body of your stateful function - 'lastTakenId' is the last
// state and 'tickAt' is the current value. The 'res' list stores
// all returned elements
let tickId = tickAt / HOUR
if tickId <> lastTakenId then
// We return new state for 'lastTakenId' and append current element to result
(tickId, tickAt::res)
else
// Here, we skip element, so we return the original state and original list
(lastTakenId, res) ) (-1L, []) // Initial state: -1 and empty list of results
// Take the second part of the state (the result list) and
// reverse it, because it was accumulated in the opposite order
|> snd |> List.rev
Run Code Online (Sandbox Code Playgroud)
除此之外,我不完全确定你的其他纯粹的解决方案 - 我不认为它与第一个完全相同(但我没有要测试的数据),因为你只是比较两个相邻的元素(也许,在第一个元素中,你可以跳过多个项目?)
就像Tomas的解决方案一样(事实上,我使用他作为我的起点,评论和所有),除了使用Seq.scan,它允许你避免List.rev并根据需要产生结果(因此,我们可以处理无限的滴答流,例如).
let hourlyTicks =
readTicks @"EURUSD-history.zip" "EURUSD-2012-04.csv"
|> Seq.scan (fun (lastTakenId,_) tickAt ->
// Similar to the body of your stateful function - 'lastTakenId' is the last state
// and 'tickAt' is the current value.
let tickId = tickAt / HOUR
if tickId <> lastTakenId then
// We return new state for 'lastTakenId' and yield current
// element to the "scan stream"
(tickId, Some(tickAt))
else
// Here, we skip element, so we return the original tick id and
// yield None to the "scan stream"
(lastTakenId, None) ) (-1L, None) // Initial state: -1 and None
//yield all the snd elements of the "scan stream" where Option.isSome
|> Seq.choose snd
Run Code Online (Sandbox Code Playgroud)
(免责声明:我没有测试这个,因为我没有在你的问题中假设所有的依赖关系).
更新以回应评论
我想知道你看到的性能损失是由于装箱/取消装箱累加器中的值.我很想知道以下是否有改进:
open System
open System.Collections.Generic
let hourlyTicks3 =
readTicks @"EURUSD-history.zip" "EURUSD-2012-04.csv"
|> Seq.scan (fun (kvp:KeyValuePair<_,_>) tickAt ->
let lastTakenId = kvp.Key
// Similar to the body of your stateful function - 'lastTakenId' is the last state
// and 'tickAt' is the current value.
let tickId = tickAt / HOUR
if tickId <> lastTakenId then
// We return new state for 'lastTakenId' and yield current
// element to the "scan stream"
KeyValuePair<_,_>(tickId, Nullable<_>(tickAt))
else
// Here, we skip element, so we return the original tick id and
// yield "null" to the "scan stream"
KeyValuePair<_,_>(lastTakenId, Nullable<_>()) ) (KeyValuePair<_,_>(-1L, Nullable<_>())) // Initial state: -1 and "null"
//yield all Values of KeyValuePair.Value elements of the "scan stream" where Nullable.HasValue
|> Seq.filter (fun kvp -> kvp.Value.HasValue)
|> Seq.map (fun kvp -> kvp.Value.Value)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
246 次 |
| 最近记录: |