计算 TimeSeries 滑动窗口的最大值

Nic*_*rsi 6 c# linq algorithm sliding-window

输入:

    public class MyObject
    {
        public double Value { get; set; }
        public DateTime Date { get; set; }
    }
Run Code Online (Sandbox Code Playgroud)

生成测试对象的方法:

public static MyObject[] GetTestObjects()
{
    var rnd = new Random();
    var date = new DateTime(2021, 1, 1, 0, 0, 0);
    var result = new List<MyObject>();
    for (int i = 0; i < 50000; i++)
    {
        //this is to simulate real data having gaps
        if (rnd.Next(100) < 25)
        {
            continue;
        }
        var myObject = new MyObject()
        {
            Value = rnd.NextDouble(),
            Date = date.AddMinutes(15 * i)
        };
        result.Add(myObject);
    }
    return result.ToArray();
}
Run Code Online (Sandbox Code Playgroud)

鉴于此,我需要计算每个 myObject 过去 12 个月的最大值。我可以考虑这样做 InParallel,但也许有一个优化的解决方案?

抱歉不清楚,这就是我现在用来获得我想要的东西:

        public MyObject[] BruteForceBackward(MyObject[] testData)
        {
            return testData.AsParallel().Select(point =>
            {
                var max = testData.Where(x => x.Date <= point.Date && x.Date >= point.Date.AddYears(-1)).Max(x => x.Value);
                return new MyObject() { Date = point.Date, Value = point.Value / max };
            }).OrderBy(r => r.Date).ToArray();
        }
Run Code Online (Sandbox Code Playgroud)

这可行,但速度很慢并且会占用处理器资源(想象一下,你有 100k 个对象),我相信一定有更好的东西

jul*_*old 5

我有一个类似的项目,我必须根据大量传感器数据计算这些内容。

现在,您可以在我的 Github 存储库中找到更精致的版本,该版本应该可以使用 (.Net): https ://github.com/forReason/Statistics-Helper-Library

一般来说,您希望减少遍历所有数据的循环量。最好的情况是,您只想触摸每个元素一次。

处理数组(相当于BruteForceBackwards

public static MyObject[] FlowThroughForward(ref MyObject[] testData)
{
    // generate return array
    MyObject[] returnData = new MyObject[testData.Length];
    // keep track to minimize processing
    double currentMaximum = 0;
    List<MyObject> maximumValues = new List<MyObject>();
    // go through the elements
    for (int i = 0; i < testData.Length; i++)
    {
        // calculate the oldest date to keep in tracking list
        DateTime targetDate = testData[i].Date.AddYears(-1);
        // maximum logic
        if (testData[i].Value >= currentMaximum)
        {
            // new maximum found, clear tracking list
            // this is the best case scenario
            maximumValues.Clear();
            currentMaximum = testData[i].Value;
        }
        else
        {
            // unfortunately, no new maximum was found
            // go backwards the maximum tracking list and check for smaller values
            // clear the list of all smaller values. The list should therefore always
            // be in descending order
            for (int b = maximumValues.Count - 1; b >= 0; b--)
            {
                if (maximumValues[b].Value <= testData[i].Value)
                {
                    // a lower value has been found. We have a newer, higher value
                    // clear this waste value from the tracking list
                    maximumValues.RemoveAt(b);
                }
                else
                {
                    // there are no more lower values. 
                    // stop looking for smaller values to save time
                    break;
                }
            }
        }
        // append new value to tracking list, no matter if higher or lower
        // all future values might be lower
        maximumValues.Add(testData[i]);
        // check if the oldest value is too old to be kept in the tracking list
        while (maximumValues[0].Date < targetDate)
        {
            // oldest value is to be removed
            maximumValues.RemoveAt(0);
            // update maximum
            currentMaximum = maximumValues[0].Value;
        }
        // add object to result list
        returnData[i] = new MyObject() { Date = testData[i].Date, Value = testData[i].Value / currentMaximum }; ;
    }
    return returnData;
}
Run Code Online (Sandbox Code Playgroud)

实时数据或流数据

注意:如果您的列表非常大,则传递完整数组的方法可能会遇到内存问题。在这种情况下:一次传递一个值,将它们从最旧的值传递到最新的值。一次存储一个值。该函数也可用于实时数据。
测试方法包含在代码中。

static void Main(string[] args)
{
    int length = 50000;
    
    Stopwatch stopWatch1 = new Stopwatch();
    stopWatch1.Start();
    var myObject = new MyObject();
    var result = new List<MyObject>();
    var date = new DateTime(2021, 1, 1, 0, 0, 0);
    for (int i = 0; i < length; i++)
    {
        //this is to simulate real data having gaps
        if (rnd.Next(100) < 25)
        {
            continue;
        }
        myObject.Value = rnd.NextDouble();
        myObject.Date = date.AddMinutes(15 * i);
        result.Add(CalculateNextObject(ref myObject));
    }
    stopWatch1.Stop();
    Console.WriteLine("test code executed in " + stopWatch1.ElapsedMilliseconds + " ms");
    Thread.Sleep(1000000);
}

private static Random rnd = new Random();
private static double currentMaximum = 0;
private static List<MyObject> maximumValues = new List<MyObject>();
public static MyObject CalculateNextObject(ref MyObject input)
{
        // calculate the oldest date to keep in tracking list
        DateTime targetDate = input.Date.AddYears(-1);
        // maximum logic
        if (input.Value >= currentMaximum)
        {
            // new maximum found, clear tracking list
            // this is the best case scenario
            maximumValues.Clear();
            currentMaximum = input.Value;
        }
        else
        {
            // unfortunately, no new maximum was found
            // go backwards the maximum tracking list and check for smaller values
            // clear the list of all smaller values. The list should therefore always
            // be in descending order
            for (int b = maximumValues.Count - 1; b >= 0; b--)
            {
                if (maximumValues[b].Value <= input.Value)
                {
                    // a lower value has been found. We have a newer, higher value
                    // clear this waste value from the tracking list
                    maximumValues.RemoveAt(b);
                }
                else
                {
                    // there are no more lower values. 
                    // stop looking for smaller values to save time
                    break;
                }
            }
        }
        // append new value to tracking list, no matter if higher or lower
        // all future values might be lower
        maximumValues.Add(input);
        // check if the oldest value is too old to be kept in the tracking list
        while (maximumValues[0].Date < targetDate)
        {
            // oldest value is to be removed
            maximumValues.RemoveAt(0);
            // update maximum
            currentMaximum = maximumValues[0].Value;
        }
    // add object to result list
    MyObject returnData = new MyObject() { Date = input.Date, Value = input.Value / currentMaximum };
    return returnData;
}
Run Code Online (Sandbox Code Playgroud)

测试方法

static void Main(string[] args)
{
    MyObject[] testData = GetTestObjects();
    Stopwatch stopWatch1 = new Stopwatch();
    Stopwatch stopWatch2 = new Stopwatch();
    stopWatch1.Start();
    MyObject[] testresults1 = BruteForceBackward(testData);
    stopWatch1.Stop();
    Console.WriteLine("BruteForceBackward executed in " + stopWatch1.ElapsedMilliseconds + " ms");
    stopWatch2.Start();
    MyObject[] testresults2 = FlowThroughForward(ref testData);
    stopWatch2.Stop();
    Console.WriteLine("FlowThroughForward executed in " + stopWatch2.ElapsedMilliseconds + " ms");
    Console.WriteLine();
    Console.WriteLine("Comparing some random test results: ");
    var rnd = new Random();
    for (int i = 0; i < 10; i++)
    {
        int index = rnd.Next(0, testData.Length);
        Console.WriteLine("Index: " + index + " brute: " + testresults1[index].Value + " flow: " + testresults2[index].Value);
    }
    Thread.Sleep(1000000);
}
Run Code Online (Sandbox Code Playgroud)

测试结果

测试是在 32 个核心的机器上进行的,因此理论上多线程方法应该具有优势,但您会看到;)

功能 功能时间 时间 %
暴力破解 5334 毫秒 99.9%
前向流 5毫秒 0.094%

性能改进系数:~time/1000

带数据验证的控制台输出:

BruteForceBackward executed in 5264 ms
FlowThroughForward executed in 5 ms

Comparing some random test results:
Index: 25291 brute: 0.989688139105413 flow: 0.989688139105413
Index: 11945 brute: 0.59670821976193 flow: 0.59670821976193
Index: 30282 brute: 0.413238225210297 flow: 0.413238225210297
Index: 33898 brute: 0.38258761939139 flow: 0.38258761939139
Index: 8824 brute: 0.833512217105447 flow: 0.833512217105447
Index: 22092 brute: 0.648052464067263 flow: 0.648052464067263
Index: 24633 brute: 0.35859417692481 flow: 0.35859417692481
Index: 24061 brute: 0.540642018793402 flow: 0.540642018793402
Index: 34219 brute: 0.498785766613022 flow: 0.498785766613022
Index: 2396 brute: 0.151471808392111 flow: 0.151471808392111
Run Code Online (Sandbox Code Playgroud)

由于并行化,Bruteforce 向后的 CPU 使用率要高得多。 在此输入图像描述

最坏的情况是值长期下降。代码仍然可以大大优化,但我想这应该足够了。为了进一步优化,在删除/添加元素时,人们可能会考虑减少列表洗牌maximumValues

  • 需要考虑的一件事是,如果输入数据中存在间隙(偏离时间间隔),就像 OP 的情况一样,最后一个“if”应替换为“while”,以确保删除所有过时的值。 (2认同)