Gre*_*edo 5 algorithm vba arraylist filter time-complexity
我已经实现了我认为在 VBA 中过滤 a 的相当垃圾的方法System.Collections.ArrayList。该代码采用一个列表和一个项目/比较值来过滤掉。它循环遍历列表并删除匹配的项目。然后它重新启动循环(因为你不能For Each同时.Remove)
Public Sub Filter(ByVal testValue As Object, ByVal dataSet As ArrayList)
'testValue and the items in `dataSet` all Implement IComparable from mscorlib.dll
'This allows comparing objects for equality
'i.e. obj1.CompareTo(obj2) = 0 is equivalent to obj1 = obj2
Dim item As IComparable
Dim repeat As Boolean
repeat = False
For Each item In dataSet
If item.CompareTo(testValue) = 0 Then 'or equiv; If item = testValue
dataSet.Remove item
repeat = True
Exit For
End If
Next item
If repeat Then Filter testValue, dataSet
End Sub
Run Code Online (Sandbox Code Playgroud)
为什么是垃圾?
假设列表的X元素很长,并且包含Y与过滤条件匹配的项目,带有X>Y. 据我所知,最好的情况性能是O(X),当所有的Ys 在开始时都聚集在一起。最糟糕的情况是所有Ys 在末尾都聚集在一起。在这种情况下,算法需要(X-Y)*Y查找操作,max 当 时Y=X/2,所以O(X^2)
O(X)与当你到达匹配项时逐步前进并删除但不打破循环的简单算法相比,这很糟糕。但我不知道如何实施它。有没有办法提高这个过滤器的性能?
你能不能做如下的事情,我相信这是 O(n) :
Option Explicit
Public Sub RemItems()
Const TARGET_VALUE As String = "dd"
Dim myList As Object
Set myList = CreateObject("System.Collections.ArrayList")
myList.Add "a"
myList.Add "dd"
myList.Add "a"
myList.Add "a"
myList.Add "a"
myList.Add "dd"
myList.Add "a"
myList.Add "a"
myList.Add "dd"
myList.Add "a"
myList.Add "a"
Dim i As Long
For i = myList.Count - 1 To 0 Step -1
If myList(i) = TARGET_VALUE Then myList.Remove myList(i)
Next i
End Sub
Run Code Online (Sandbox Code Playgroud)
有关复杂性信息,请参阅此讨论:
如果这是可信的(.NET-Big-O-Algorithm-Complexity-Cheat-Sheet):
注意:我使用https://htmledit.squarefree.com/渲染了 HTML
警告 - 我不是计算机科学毕业生。这就是闹着玩的。我确信对于正在处理的数据类型、分布等存在争议......欢迎改进
上面的 .Net 表显示,与 ArrayList 相比, HashTable中的删除是平均O(1)删除操作O(n),因此我从 value 中随机生成了 100,000 行{"a","b","c"}。然后我用它作为我的固定测试集以获得以下结果。
测试运行代码(请温柔!)
Option Explicit
Private Declare PtrSafe Function getFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare PtrSafe Function getTickCount Lib "kernel32" _
Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long
Public Sub TestingArrayList()
Const TARGET_VALUE = "a"
Dim aList As Object
Set aList = CreateObject("System.Collections.ArrayList")
Dim arr()
arr = ThisWorkbook.Worksheets("Sheet1").Range("A1").CurrentRegion.Value '<== Reads in 100000 value
Dim i As Long
For i = 1 To UBound(arr, 1) '50000
aList.Add arr(i, 2)
Next i
Debug.Print aList.Contains(TARGET_VALUE)
Dim StartTime As Double
StartTime = MicroTimer()
For i = aList.Count - 1 To 0 Step -1
If aList(i) = TARGET_VALUE Then aList.Remove aList(i)
Next i
Debug.Print "Removal from array list took: " & Round(MicroTimer - StartTime, 3) & " seconds"
Debug.Print aList.Contains(TARGET_VALUE)
End Sub
Public Sub TestingHashTable()
Const TARGET_VALUE = "a"
Dim hTable As Object
Set hTable = CreateObject("System.Collections.HashTable")
Dim arr()
arr = ThisWorkbook.Worksheets("Sheet1").Range("A1").CurrentRegion.Value '<== Reads in 100000 value
Dim i As Long
For i = 1 To UBound(arr, 1) '50000
hTable.Add i, arr(i, 2)
Next i
Debug.Print hTable.ContainsValue(TARGET_VALUE)
Dim StartTime As Double
StartTime = MicroTimer()
For i = hTable.Count To 1 Step -1
If hTable(i) = TARGET_VALUE Then hTable.Remove i
Next i
Debug.Print "Removal from hash table took: " & Round(MicroTimer - StartTime, 3) & " seconds"
Debug.Print hTable.ContainsValue(TARGET_VALUE)
End Sub
Public Function MicroTimer() As Double
Dim cyTicks1 As Currency
Static cyFrequency As Currency
MicroTimer = 0
If cyFrequency = 0 Then getFrequency cyFrequency
getTickCount cyTicks1
If cyFrequency Then MicroTimer = cyTicks1 / cyFrequency
End Function
Run Code Online (Sandbox Code Playgroud)
上面的结果似乎是 0(1)。
简单地查看删除过程(删除其他因素),结果不太确定,但同样,我的编码可能是一个因素!
修改后的代码(删除其他因素):
Option Explicit
Public Sub TestingComparison()
Const RUN_COUNT As Long = 4
Dim hTable As Object
Dim aList As Object
Dim i As Long, j As Long, k As Long, rowCount As Long
Dim results() As Double
Set hTable = CreateObject("System.Collections.HashTable")
Set aList = CreateObject("System.Collections.ArrayList")
Dim testSizes()
testSizes = Array(100, 1000, 10000, 100000) ', 1000000)
ReDim results(0 To RUN_COUNT * (UBound(testSizes) + 1) - 1, 0 To 4)
Application.ScreenUpdating = False
With ThisWorkbook.Worksheets("Sheet5")
For i = LBound(testSizes) To UBound(testSizes)
For k = 1 To RUN_COUNT
For j = 1 To testSizes(i)
hTable.Add j, 1
aList.Add 1
Next j
Dim StartTime As Double, completionTime As Double
StartTime = MicroTimer()
For j = hTable.Count To 1 Step -1
hTable.Remove j
Next j
results(rowCount, 3) = Round(MicroTimer - StartTime, 3)
results(rowCount, 0) = testSizes(i)
results(rowCount, 1) = k
StartTime = MicroTimer()
For j = aList.Count - 1 To 0 Step -1
aList.Remove aList(j)
Next j
results(rowCount, 2) = Round(MicroTimer - StartTime, 3)
hTable.Clear
aList.Clear
rowCount = rowCount + 1
Next k
Next i
.Range("A2").Resize(UBound(results, 1) + 1, UBound(results, 2)) = results
End With
Application.ScreenUpdating = True
End Sub
Run Code Online (Sandbox Code Playgroud)