代码优化:数组与集合

Gre*_*edo 4 arrays collections optimization performance vba

在内存消耗/执行时间方面,向组添加项目的方法更为昂贵

Redim Preserve myArray(1 To Ubound(myArray) + 1)
myArray(Ubound(myArray)) = myVal
Run Code Online (Sandbox Code Playgroud)

要么

myCollection.Add myVal
Run Code Online (Sandbox Code Playgroud)

最快的一个是依赖于组myVal,还是根据组的大小而变化?还有更快的方法吗?

我有一个在类的声明部分声明私有的数组/集合,如果这有所不同,但我想知道幕后发生了什么,哪种方法通常更快(不是在可读性或可维护性方面,只是执行时间处理时间)

测试

好的,运行了一些测试,为组和集合添加了1个实例,我的结果是:

  • 收集方法比变体数组快10倍
  • 收集方法比长数组快5倍
  • 收集方法比字节数组快1.5倍

使用此代码循环结果大约为5秒:

Sub testtime()
Dim sttime As Double
Dim endtime As Double
Dim i As Long
Dim u As Long
i = 0
ReDim a(1 To 1) 'or Set c = New Collection
sttime = Timer
endtime = Timer + 5
Do Until Timer > endtime
    u = UBound(a) + 1
    ReDim Preserve a(1 To u)
    a(u) = 1
    'or c.Add 1
    i = i + 1
Loop
endtime = Timer
Debug.Print (endtime - sttime) / i; "s", i; "iterations", Round(endtime - sttime, 3); "(ish) s"
End Sub
Run Code Online (Sandbox Code Playgroud)

所以看起来像添加该项目,具有相对较大的组; 添加到集合更快,但我想知道为什么?

Mat*_*don 10

ReDim Preserve正在扭曲这一切.

ReDim Preserve myArray(1 To UBound(myArray) + 1)
Run Code Online (Sandbox Code Playgroud)

这本质上是低效的,而且是不公平的比较.每次添加项目时,都会在内部复制整个数组.我希望a Collection比这更有效率.如果没有,请使用.NET ArrayList,这在.NET中已弃用,因为v2.0引入了泛型List<T>,但在VBA中可用且有用 - (.NET泛型不能在VBA中使用).

一个ArrayList不调整其内部_items阵列,每一次添加的项目!注意评论:

// Adds the given object to the end of this list. The size of the list is
// increased by one. If required, the capacity of the list is doubled
// before adding the new element.
//
public virtual int Add(Object value) {
    Contract.Ensures(Contract.Result<int>() >= 0);
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size] = value;
    _version++;
    return _size++;
}

...

// Ensures that the capacity of this list is at least the given minimum
// value. If the currect capacity of the list is less than min, the
// capacity is increased to twice the current capacity or to min,
// whichever is larger.
private void EnsureCapacity(int min) {
    if (_items.Length < min) {
        int newCapacity = _items.Length == 0? _defaultCapacity: _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity < min) newCapacity = min;
        Capacity = newCapacity;
    }
}
Run Code Online (Sandbox Code Playgroud)

https://referencesource.microsoft.com/#mscorlib/system/collections/arraylist.cs

我不知道a的内部VBA.Collection,但如果我不得不猜测,我会说它可能有一个类似的机制,避免每次添加一个项目时重新标注内部数组.但这一切都没有实际意义,因为除了微软之外没有人知道如何VBA.Collection实施.

我们可以做的是运行基准并进行比较 - 让我们为数组,集合和heck添加一百万个值,一个ArrayList:

Public Sub TestReDimArray()
    Dim sut() As Variant
    ReDim sut(0 To 0)
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While UBound(sut) < 1000000
        ReDim Preserve sut(0 To i)
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "ReDimArray added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestCollection()
    Dim sut As VBA.Collection
    Set sut = New VBA.Collection
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "Collection added 1M items in " & Timer - t & " seconds."
End Sub

Public Sub TestArrayList()
    Dim sut As Object
    Set sut = CreateObject("System.Collections.ArrayList")
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While sut.Count < 1000000
        sut.Add i
        i = i + 1
    Loop
    Debug.Print "ArrayList added 1M items in " & Timer - t & " seconds."
End Sub
Run Code Online (Sandbox Code Playgroud)

这是输出:

ReDimArray added 1M items in 14.90234 seconds.
Collection added 1M items in 0.1875 seconds.
ArrayList added 1M items in 15.64453 seconds.
Run Code Online (Sandbox Code Playgroud)

请注意,引用32位mscorlib.tlb和早期绑定ArrayList并没有多大区别.另外还有托管/ COM互操作开销,而VBA不支持构造函数,因此每次达到ArrayList容量时初始化的容量都会增加4一倍,即当我们插入百万项时,我们已经调整了内部数组的大小19次并最终内部容量为1,048,576件.

那么如何Collection赢得那么多呢?

因为数组被滥用:调整大小不是数组最好的,并且在每次插入之前调整大小不可能顺利.

何时使用数组?

当您事先知道元素数量时使用数组:

Public Sub TestPopulateArray()
    Dim sut(0 To 999999) As Variant
    Dim i As Long
    Dim t As Single
    t = Timer
    Do While i < 1000000
        sut(i) = i
        i = i + 1
    Loop
    Debug.Print "PopulateArray added 1M items in " & Timer - t & " seconds."
End Sub
Run Code Online (Sandbox Code Playgroud)

输出:

PopulateArray added 1M items in 0.0234375 seconds.
Run Code Online (Sandbox Code Playgroud)

这比将相同数量的项目,以更快的大约10倍VBA.Collection-用得好阵列是极快的.


TL; DR

保持阵列调整大小最小化 - 尽可能避免它.如果您不知道最终要使用的项目数量,请使用Collection.如果您这样做,请使用明确大小的Array.

  • 或者,如果您_必须_使用“ReDim Preserve”,请不要将大小增加 1。每次达到最大容量时,添加 10 或 50 或 100,或者只是将其加倍。 (3认同)