过滤列表的算法

Gre*_*edo 5 algorithm vba arraylist filter time-complexity

我已经实现了我认为在 VBA 中过滤 a 的相当垃圾的方法System.Collections.ArrayList。该代码采用一个列表和一个项目/比较值来过滤掉。它循环遍历列表并删除匹配的项目。然后它重新启动循环(因为你不能For Each同时.Remove

Public Sub Filter(ByVal testValue As Object, ByVal dataSet As ArrayList)
'testValue and the items in `dataSet` all Implement IComparable from mscorlib.dll
'This allows comparing objects for equality
'i.e. obj1.CompareTo(obj2) = 0 is equivalent to obj1 = obj2
    Dim item As IComparable
    Dim repeat As Boolean
    repeat = False
    For Each item In dataSet
        If item.CompareTo(testValue) = 0 Then   'or equiv; If item = testValue
            dataSet.Remove item
            repeat = True
            Exit For
        End If
    Next item
    If repeat Then Filter testValue, dataSet 
End Sub
Run Code Online (Sandbox Code Playgroud)

为什么是垃圾

假设列表的X元素很长,并且包含Y与过滤条件匹配的项目,带有X>Y. 据我所知,最好的情况性能是O(X),当所有的Ys 在开始时都聚集在一起。最糟糕的情况是所有Ys 在末尾都聚集在一起。在这种情况下,算法需要(X-Y)*Y查找操作,max 当 时Y=X/2,所以O(X^2)

O(X)与当你到达匹配项时逐步前进并删除但不打破循环的简单算法相比,这很糟糕。但我不知道如何实施它。有没有办法提高这个过滤器的性能?

QHa*_*arr 2

你能不能做如下的事情,我相信这是 O(n) :

Option Explicit

Public Sub RemItems()

    Const TARGET_VALUE As String = "dd"
    Dim myList As Object
    Set myList = CreateObject("System.Collections.ArrayList")

    myList.Add "a"
    myList.Add "dd"
    myList.Add "a"
    myList.Add "a"
    myList.Add "a"
    myList.Add "dd"
    myList.Add "a"
    myList.Add "a"
    myList.Add "dd"
    myList.Add "a"
    myList.Add "a"

    Dim i As Long
    For i = myList.Count - 1 To 0 Step -1
        If myList(i) = TARGET_VALUE Then myList.Remove myList(i)
    Next i

End Sub
Run Code Online (Sandbox Code Playgroud)

有关复杂性信息,请参阅此讨论:

.NET 集合类的渐近复杂度

如果是可信的(.NET-Big-O-Algorithm-Complexity-Cheat-Sheet):

在此输入图像描述

注意:我使用https://htmledit.squarefree.com/渲染了 HTML

编辑:

警告 - 我不是计算机科学毕业生。这就是闹着玩的。我确信对于正在处理的数据类型、分布等存在争议......欢迎改进

上面的 .Net 表显示,与 ArrayList 相比, HashTable中的删除是平均O(1)删除操作O(n),因此我从 value 中随机生成了 100,000 行{"a","b","c"}。然后我用它作为我的固定测试集​​以获得以下结果。

跑步

测试集比例

测试运行代码(请温柔!)

Option Explicit

Private Declare PtrSafe Function getFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare PtrSafe Function getTickCount Lib "kernel32" _
Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long

Public Sub TestingArrayList()
    Const TARGET_VALUE = "a"
    Dim aList As Object
    Set aList = CreateObject("System.Collections.ArrayList")

    Dim arr()
    arr = ThisWorkbook.Worksheets("Sheet1").Range("A1").CurrentRegion.Value '<== Reads in 100000 value

    Dim i As Long
    For i = 1 To UBound(arr, 1) '50000
        aList.Add arr(i, 2)
    Next i

    Debug.Print aList.Contains(TARGET_VALUE)

    Dim StartTime As Double

    StartTime = MicroTimer()

    For i = aList.Count - 1 To 0 Step -1
       If aList(i) = TARGET_VALUE Then aList.Remove aList(i)
    Next i

    Debug.Print "Removal from array list took: " & Round(MicroTimer - StartTime, 3) & " seconds"
    Debug.Print aList.Contains(TARGET_VALUE)

End Sub

Public Sub TestingHashTable()
    Const TARGET_VALUE = "a"
    Dim hTable As Object
    Set hTable = CreateObject("System.Collections.HashTable")

    Dim arr()
    arr = ThisWorkbook.Worksheets("Sheet1").Range("A1").CurrentRegion.Value '<== Reads in 100000 value

    Dim i As Long
    For i = 1 To UBound(arr, 1) '50000
        hTable.Add i, arr(i, 2)
    Next i

    Debug.Print hTable.ContainsValue(TARGET_VALUE)

    Dim StartTime As Double

    StartTime = MicroTimer()

    For i = hTable.Count To 1 Step -1
       If hTable(i) = TARGET_VALUE Then hTable.Remove i
    Next i

    Debug.Print "Removal from hash table took: " & Round(MicroTimer - StartTime, 3) & " seconds"
    Debug.Print hTable.ContainsValue(TARGET_VALUE)

End Sub

Public Function MicroTimer() As Double

    Dim cyTicks1 As Currency
    Static cyFrequency As Currency

    MicroTimer = 0

    If cyFrequency = 0 Then getFrequency cyFrequency

    getTickCount cyTicks1

    If cyFrequency Then MicroTimer = cyTicks1 / cyFrequency
End Function
Run Code Online (Sandbox Code Playgroud)

上面的结果似乎是 0(1)。

简单地查看删除过程(删除其他因素),结果不太确定,但同样,我的编码可能是一个因素!

删除运行

修改后的代码(删除其他因素):

Option Explicit

Public Sub TestingComparison()

    Const RUN_COUNT As Long = 4

    Dim hTable As Object
    Dim aList As Object
    Dim i As Long, j As Long, k As Long, rowCount As Long
    Dim results() As Double

    Set hTable = CreateObject("System.Collections.HashTable")
    Set aList = CreateObject("System.Collections.ArrayList")

    Dim testSizes()
    testSizes = Array(100, 1000, 10000, 100000)  ', 1000000)
    ReDim results(0 To RUN_COUNT * (UBound(testSizes) + 1) - 1, 0 To 4)

    Application.ScreenUpdating = False

    With ThisWorkbook.Worksheets("Sheet5")

        For i = LBound(testSizes) To UBound(testSizes)

            For k = 1 To RUN_COUNT

                For j = 1 To testSizes(i)
                    hTable.Add j, 1
                    aList.Add 1
                Next j

                Dim StartTime As Double, completionTime As Double

                StartTime = MicroTimer()

                For j = hTable.Count To 1 Step -1
                    hTable.Remove j
                Next j

                results(rowCount, 3) = Round(MicroTimer - StartTime, 3)
                results(rowCount, 0) = testSizes(i)
                results(rowCount, 1) = k

                StartTime = MicroTimer()

                For j = aList.Count - 1 To 0 Step -1
                    aList.Remove aList(j)
                Next j

                results(rowCount, 2) = Round(MicroTimer - StartTime, 3)

                hTable.Clear
                aList.Clear
                rowCount = rowCount + 1
            Next k

        Next i

        .Range("A2").Resize(UBound(results, 1) + 1, UBound(results, 2)) = results

    End With

    Application.ScreenUpdating = True
End Sub
Run Code Online (Sandbox Code Playgroud)