vba:从数组中获取唯一值

l--*_*''' 46 excel vba excel-vba

在vba中有内置的功能来从一维数组中获取唯一值吗?怎么样才能摆脱重复?

如果没有,那么我如何从数组中获取唯一值?

Doc*_*own 51

这篇文章包含2个例子.我喜欢第二个:

Sub unique() 
  Dim arr As New Collection, a 
  Dim aFirstArray() As Variant 
  Dim i As Long 

  aFirstArray() = Array("Banana", "Apple", "Orange", "Tomato", "Apple", _ 
  "Lemon", "Lime", "Lime", "Apple") 

  On Error Resume Next 
  For Each a In aFirstArray 
     arr.Add a, a 
  Next 

  For i = 1 To arr.Count 
     Cells(i, 1) = arr(i) 
  Next 

End Sub 
Run Code Online (Sandbox Code Playgroud)

  • 值得添加(即使在这个较晚的日期)Collections*可以是*唯一的,只要你在添加项目时使用第二个`Key`参数.`Key`值必须始终是唯一的,并且添加带有现有Key的项会引发错误:因此`On Error Resume Next` (15认同)
  • 使用Collection的这个答案比Joseph Wood的字典方法[如指出](http://stackoverflow.com/a/33481391/2529619)更快. (2认同)
  • 如果你在同一个子程序中有更多的代码,那么记得在第一个循环之后添加 On Error GoTo 0 (2认同)

eks*_*tso 39

没有内置功能可以从阵列中删除重复项.Raj的回答似乎很优雅,但我更喜欢使用词典.

Dim d As Object
Set d = CreateObject("Scripting.Dictionary")
'Set d = New Scripting.Dictionary

Dim i As Long
For i = LBound(myArray) To UBound(myArray)
    d(myArray(i)) = 1
Next i

Dim v As Variant
For Each v In d.Keys()
    'd.Keys() is a Variant array of the unique values in myArray.
    'v will iterate through each of them.
Next v
Run Code Online (Sandbox Code Playgroud)

编辑:我改变了循环使用LBoundUBound每托默勒格建议的答案.编辑:d.Keys()是一个Variant数组,而不是一个集合.

  • 那就是使用`New`语法.只要安装了Microsoft Scripting Runtime并且可以从References窗口获得,`CreateObject`就可以在没有引用的情况下工作. (6认同)
  • 不,`d.Keys()`是一个Variant数组.(我会解决我的问题.)如果你不相信我,试试这个:set d = CreateObject("Scripting.Dictionary")d(45)= 1 d(33.33)= 1 d("45")= d.keys()中的每个i为1:?i,typename(i):next你将得到一个`Integer`,一个`Double`和一个`String`. (3认同)
  • 请注意,需要引用“Microsoft Scripting Runtime”才能访问 Dictionary 对象。 (2认同)
  • 最好使用强类型的词典声明("Dim d As Dictionary").在声明为对象和后期绑定的大型数据阵列上可能会导致性能问题. (2认同)
  • @Whitebeard13 非常感谢!我很高兴它对你有帮助。字典是令人着迷的东西,值得进一步研究。字典保存键/值对。当您为某个键分配值时,即“d(myArray(i)) = 1”所做的事情,字典“d”会检查该键是否存在。如果密钥不存在,它会创建它;如果存在,它会重用该密钥。所以字典的键保证是唯一的。我们分配给每个键的值“1”只是一个虚拟值;我们仅使用字典的键来获取原始集合元素的不同集合。 (2认同)

Jos*_*ood 17

更新(6/15/16)

我创建了更全面的基准测试.首先,正如@ChaimG指出的那样,早期绑定会产生很大的不同(我最初使用@ eksortso上面的代码,使用后期绑定).其次,我原来的基准测试只包括创建唯一对象的时间,但是,它没有测试使用对象的效率.我这样做的意思是,如果我创建的对象非常快,如果我创建的对象笨拙并且让我向前移动,那么这并不重要.

旧备注:事实证明,循环收集对象的效率非常低

事实证明,如果你知道怎么做(我没有),循环收集可以非常有效.正如@ChaimG(又一次),在评论中指出,使用一个For Each结构比简单地使用一个For循环要好得多.为了给你一个想法,改变循环结构之前,对于时间Collection2Test Case Size = 10^6超过15世纪(即〜23分钟).它现在只有0.195秒(超过7000倍).

对于该Collection方法有两次.第一个(我的原始基准Collection1)显示创建唯一对象的时间.第二部分(Collection2)显示了循环对象的时间(这是非常自然的)以创建可返回数组,就像其他函数一样.

在下面的图表中,黄色背景表示它是该测试用例最快的,红色表示最慢("未测试"算法被排除在外).对于总的时间Collection的方法是的总和Collection1Collection2.绿松石表示无论原始订单如何都是最快的.

Benchmarks5

下面是我创建的原始算法(我稍微修改了它,例如我不再实例化我自己的数据类型).它在非常可观的时间内返回具有原始顺序的数组的唯一值,并且可以对其进行修改以采用任何数据类型.除此之外IndexMethod,它是非常大的阵列的最快算法.

以下是此算法背后的主要思想:

  1. 索引数组
  2. 按值排序
  3. 在数组的末尾放置相同的值,然后将其"切断".
  4. 最后,按索引排序.

以下是一个例子:

Let myArray = (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)

    1.  (86, 100, 33, 19, 33, 703, 19, 100, 703, 19)
        (1 ,   2,  3,  4,  5,   6,  7,   8,   9, 10)   <<-- Indexing

    2.  (19, 19, 19, 33, 33, 86, 100, 100, 703, 703)   <<-- sort by values     
        (4,   7, 10,  3,  5,  1,   2,   8,   6,   9)

    3.  (19, 33,  86, 100, 703)   <<-- remove duplicates    
        (4,   3,   1,   2,   6)

    4.  (86, 100,  33, 19, 703)   
        ( 1,   2,   3,  4,   6)   <<-- sort by index
Run Code Online (Sandbox Code Playgroud)

这是代码:

Function SortingUniqueTest(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
    Dim MyUniqueArr() As Long, i As Long, intInd As Integer
    Dim StrtTime As Double, Endtime As Double, HighB As Long, LowB As Long

    LowB = LBound(myArray): HighB = UBound(myArray)

    ReDim MyUniqueArr(1 To 2, LowB To HighB)
    intInd = 1 - LowB  'Guarantees the indices span 1 to Lim

    For i = LowB To HighB
        MyUniqueArr(1, i) = myArray(i)
        MyUniqueArr(2, i) = i + intInd
    Next i

    QSLong2D MyUniqueArr, 1, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2
    Call UniqueArray2D(MyUniqueArr)
    If bOrigIndex Then QSLong2D MyUniqueArr, 2, LBound(MyUniqueArr, 2), UBound(MyUniqueArr, 2), 2

    SortingUniqueTest = MyUniqueArr()
End Function

Public Sub UniqueArray2D(ByRef myArray() As Long)
    Dim i As Long, j As Long, Count As Long, Count1 As Long, DuplicateArr() As Long
    Dim lngTemp As Long, HighB As Long, LowB As Long
    LowB = LBound(myArray, 2): Count = LowB: i = LowB: HighB = UBound(myArray, 2)

    Do While i < HighB
        j = i + 1
        If myArray(1, i) = myArray(1, j) Then
            Do While myArray(1, i) = myArray(1, j)
                ReDim Preserve DuplicateArr(1 To Count)
                DuplicateArr(Count) = j
                Count = Count + 1
                j = j + 1
                If j > HighB Then Exit Do
            Loop

            QSLong2D myArray, 2, i, j - 1, 2
        End If
        i = j
    Loop

    Count1 = HighB

    If Count > 1 Then
        For i = UBound(DuplicateArr) To LBound(DuplicateArr) Step -1
            myArray(1, DuplicateArr(i)) = myArray(1, Count1)
            myArray(2, DuplicateArr(i)) = myArray(2, Count1)
            Count1 = Count1 - 1
            ReDim Preserve myArray(1 To 2, LowB To Count1)
        Next i
    End If
End Sub
Run Code Online (Sandbox Code Playgroud)

这里是排序算法我使用(更多有关此算法中点击这里).

Sub QSLong2D(ByRef saArray() As Long, bytDim As Byte, lLow1 As Long, lHigh1 As Long, bytNum As Byte)
    Dim lLow2 As Long, lHigh2 As Long
    Dim sKey As Long, sSwap As Long, i As Byte

On Error GoTo ErrorExit

    If IsMissing(lLow1) Then lLow1 = LBound(saArray, bytDim)
    If IsMissing(lHigh1) Then lHigh1 = UBound(saArray, bytDim)
    lLow2 = lLow1
    lHigh2 = lHigh1

    sKey = saArray(bytDim, (lLow1 + lHigh1) \ 2)

    Do While lLow2 < lHigh2
        Do While saArray(bytDim, lLow2) < sKey And lLow2 < lHigh1: lLow2 = lLow2 + 1: Loop
        Do While saArray(bytDim, lHigh2) > sKey And lHigh2 > lLow1: lHigh2 = lHigh2 - 1: Loop

        If lLow2 < lHigh2 Then
            For i = 1 To bytNum
                sSwap = saArray(i, lLow2)
                saArray(i, lLow2) = saArray(i, lHigh2)
                saArray(i, lHigh2) = sSwap
            Next i
        End If

        If lLow2 <= lHigh2 Then
            lLow2 = lLow2 + 1
            lHigh2 = lHigh2 - 1
        End If
    Loop

    If lHigh2 > lLow1 Then QSLong2D saArray(), bytDim, lLow1, lHigh2, bytNum
    If lLow2 < lHigh1 Then QSLong2D saArray(), bytDim, lLow2, lHigh1, bytNum

ErrorExit:

End Sub
Run Code Online (Sandbox Code Playgroud)

下面是一个特殊的算法,如果您的数据包含整数,则该算法非常快.它使用索引和布尔数据类型.

Function IndexSort(ByRef myArray() As Long, bOrigIndex As Boolean) As Variant
'' Modified to take both positive and negative integers
    Dim arrVals() As Long, arrSort() As Long, arrBool() As Boolean
    Dim i As Long, HighB As Long, myMax As Long, myMin As Long, OffSet As Long
    Dim LowB As Long, myIndex As Long, count As Long, myRange As Long

    HighB = UBound(myArray)
    LowB = LBound(myArray)

    For i = LowB To HighB
        If myArray(i) > myMax Then myMax = myArray(i)
        If myArray(i) < myMin Then myMin = myArray(i)
    Next i

    OffSet = Abs(myMin)  '' Number that will be added to every element
                         '' to guarantee every index is non-negative

    If myMax > 0 Then
        myRange = myMax + OffSet  '' E.g. if myMax = 10 & myMin = -2, then myRange = 12
    Else
        myRange = OffSet
    End If

    If bOrigIndex Then
        ReDim arrSort(1 To 2, 1 To HighB)
        ReDim arrVals(1 To 2, 0 To myRange)
        ReDim arrBool(0 To myRange)

        For i = LowB To HighB
            myIndex = myArray(i) + OffSet
            arrBool(myIndex) = True
            arrVals(1, myIndex) = myArray(i)
            If arrVals(2, myIndex) = 0 Then arrVals(2, myIndex) = i
        Next i

        For i = 0 To myRange
            If arrBool(i) Then
                count = count + 1
                arrSort(1, count) = arrVals(1, i)
                arrSort(2, count) = arrVals(2, i)
            End If
        Next i

        QSLong2D arrSort, 2, 1, count, 2
        ReDim Preserve arrSort(1 To 2, 1 To count)
    Else
        ReDim arrSort(1 To HighB)
        ReDim arrVals(0 To myRange)
        ReDim arrBool(0 To myRange)

        For i = LowB To HighB
            myIndex = myArray(i) + OffSet
            arrBool(myIndex) = True
            arrVals(myIndex) = myArray(i)
        Next i

        For i = 0 To myRange
            If arrBool(i) Then
                count = count + 1
                arrSort(count) = arrVals(i)
            End If
        Next i

        ReDim Preserve arrSort(1 To count)
    End If

    ReDim arrVals(0)
    ReDim arrBool(0)

    IndexSort = arrSort
End Function
Run Code Online (Sandbox Code Playgroud)

这是Collection(由@DocBrown)和Dictionary(由@eksortso)函数.

Function CollectionTest(ByRef arrIn() As Long, Lim As Long) As Variant
    Dim arr As New Collection, a, i As Long, arrOut() As Variant, aFirstArray As Variant
    Dim StrtTime As Double, EndTime1 As Double, EndTime2 As Double, count As Long
On Error Resume Next

    ReDim arrOut(1 To UBound(arrIn))
    ReDim aFirstArray(1 To UBound(arrIn))

    StrtTime = Timer
    For i = 1 To UBound(arrIn): aFirstArray(i) = CStr(arrIn(i)): Next i '' Convert to string
    For Each a In aFirstArray               ''' This part is actually creating the unique set
        arr.Add a, a
    Next
    EndTime1 = Timer - StrtTime

    StrtTime = Timer         ''' This part is writing back to an array for return
    For Each a In arr: count = count + 1: arrOut(count) = a: Next a
    EndTime2 = Timer - StrtTime
    CollectionTest = Array(arrOut, EndTime1, EndTime2)
End Function

Function DictionaryTest(ByRef myArray() As Long, Lim As Long) As Variant
    Dim StrtTime As Double, Endtime As Double
    Dim d As Scripting.Dictionary, i As Long  '' Early Binding
    Set d = New Scripting.Dictionary
    For i = LBound(myArray) To UBound(myArray): d(myArray(i)) = 1: Next i
    DictionaryTest = d.Keys()
End Function
Run Code Online (Sandbox Code Playgroud)

这是@IsraelHoletz提供的直接方法.

Function ArrayUnique(ByRef aArrayIn() As Long) As Variant
    Dim aArrayOut() As Variant, bFlag As Boolean, vIn As Variant, vOut As Variant
    Dim i As Long, j As Long, k As Long
    ReDim aArrayOut(LBound(aArrayIn) To UBound(aArrayIn))
    i = LBound(aArrayIn)
    j = i

    For Each vIn In aArrayIn
        For k = j To i - 1
            If vIn = aArrayOut(k) Then bFlag = True: Exit For
        Next
        If Not bFlag Then aArrayOut(i) = vIn: i = i + 1
        bFlag = False
    Next

    If i <> UBound(aArrayIn) Then ReDim Preserve aArrayOut(LBound(aArrayIn) To i - 1)
    ArrayUnique = aArrayOut
End Function

Function DirectTest(ByRef aArray() As Long, Lim As Long) As Variant
    Dim aReturn() As Variant
    Dim StrtTime As Long, Endtime As Long, i As Long
    aReturn = ArrayUnique(aArray)
    DirectTest = aReturn
End Function
Run Code Online (Sandbox Code Playgroud)

以下是比较所有函数的基准函数.您应该注意,由于内存问题,最后两种情况的处理方式略有不同.还要注意,我没有测试Collection方法Test Case Size = 10,000,000.出于某种原因,它返回了不正确的结果并且行为异常(我猜测集合对象对你可以放入多少东西有限制.我搜索过,我找不到任何关于此的文献).

Function UltimateTest(Lim As Long, bTestDirect As Boolean, bTestDictionary, bytCase As Byte) As Variant

    Dim dictionTest, collectTest, sortingTest1, indexTest1, directT '' all variants
    Dim arrTest() As Long, i As Long, bEquality As Boolean, SizeUnique As Long
    Dim myArray() As Long, StrtTime As Double, EndTime1 As Variant
    Dim EndTime2 As Double, EndTime3 As Variant, EndTime4 As Double
    Dim EndTime5 As Double, EndTime6 As Double, sortingTest2, indexTest2

    ReDim myArray(1 To Lim): Rnd (-2)   '' If you want to test negative numbers, 
    '' insert this to the left of CLng(Int(Lim... : (-1) ^ (Int(2 * Rnd())) *
    For i = LBound(myArray) To UBound(myArray): myArray(i) = CLng(Int(Lim * Rnd() + 1)): Next i
    arrTest = myArray

    If bytCase = 1 Then
        If bTestDictionary Then
            StrtTime = Timer: dictionTest = DictionaryTest(arrTest, Lim): EndTime1 = Timer - StrtTime
        Else
            EndTime1 = "Not Tested"
        End If

        arrTest = myArray
        collectTest = CollectionTest(arrTest, Lim)

        arrTest = myArray
        StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
        SizeUnique = UBound(sortingTest1, 2)

        If bTestDirect Then
            arrTest = myArray: StrtTime = Timer: directT = DirectTest(arrTest, Lim): EndTime3 = Timer - StrtTime
        Else
            EndTime3 = "Not Tested"
        End If

        arrTest = myArray
        StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime

        arrTest = myArray
        StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime

        arrTest = myArray
        StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime

        bEquality = True
        For i = LBound(sortingTest1, 2) To UBound(sortingTest1, 2)
            If Not CLng(collectTest(0)(i)) = sortingTest1(1, i) Then
                bEquality = False
                Exit For
            End If
        Next i

        For i = LBound(dictionTest) To UBound(dictionTest)
            If Not dictionTest(i) = sortingTest1(1, i + 1) Then
                bEquality = False
                Exit For
            End If
        Next i

        For i = LBound(dictionTest) To UBound(dictionTest)
            If Not dictionTest(i) = indexTest1(1, i + 1) Then
                bEquality = False
                Exit For
            End If
        Next i

        If bTestDirect Then
            For i = LBound(dictionTest) To UBound(dictionTest)
                If Not dictionTest(i) = directT(i + 1) Then
                    bEquality = False
                    Exit For
                End If
            Next i
        End If

        UltimateTest = Array(bEquality, EndTime1, EndTime2, EndTime3, EndTime4, _
                        EndTime5, EndTime6, collectTest(1), collectTest(2), SizeUnique)
    ElseIf bytCase = 2 Then
        arrTest = myArray
        collectTest = CollectionTest(arrTest, Lim)
        UltimateTest = Array(collectTest(1), collectTest(2))
    ElseIf bytCase = 3 Then
        arrTest = myArray
        StrtTime = Timer: sortingTest1 = SortingUniqueTest(arrTest, True): EndTime2 = Timer - StrtTime
        SizeUnique = UBound(sortingTest1, 2)
        UltimateTest = Array(EndTime2, SizeUnique)
    ElseIf bytCase = 4 Then
        arrTest = myArray
        StrtTime = Timer: indexTest1 = IndexSort(arrTest, True): EndTime4 = Timer - StrtTime
        UltimateTest = EndTime4
    ElseIf bytCase = 5 Then
        arrTest = myArray
        StrtTime = Timer: sortingTest2 = SortingUniqueTest(arrTest, False): EndTime5 = Timer - StrtTime
        UltimateTest = EndTime5
    ElseIf bytCase = 6 Then
        arrTest = myArray
        StrtTime = Timer: indexTest2 = IndexSort(arrTest, False): EndTime6 = Timer - StrtTime
        UltimateTest = EndTime6
    End If

End Function
Run Code Online (Sandbox Code Playgroud)

最后,这是产生上表的子.

Sub GetBenchmarks()
    Dim myVar, i As Long, TestCases As Variant, j As Long, temp

    TestCases = Array(1000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000, 2000000, 5000000, 10000000)

    For j = 0 To 11
        If j < 6 Then
            myVar = UltimateTest(CLng(TestCases(j)), True, True, 1)
        ElseIf j < 10 Then
            myVar = UltimateTest(CLng(TestCases(j)), False, True, 1)
        ElseIf j < 11 Then
            myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, 0, 0, 0)
            temp = UltimateTest(CLng(TestCases(j)), False, False, 2)
            myVar(7) = temp(0): myVar(8) = temp(1)
            temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
            myVar(2) = temp(0): myVar(9) = temp(1)
            myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
            myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
            myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
        Else
            myVar = Array("Not Tested", "Not Tested", 0.1, "Not Tested", 0.1, 0.1, 0.1, "Not Tested", "Not Tested", 0)
            temp = UltimateTest(CLng(TestCases(j)), False, False, 3)
            myVar(2) = temp(0): myVar(9) = temp(1)
            myVar(4) = UltimateTest(CLng(TestCases(j)), False, False, 4)
            myVar(5) = UltimateTest(CLng(TestCases(j)), False, False, 5)
            myVar(6) = UltimateTest(CLng(TestCases(j)), False, False, 6)
        End If

        Cells(4 + j, 6) = TestCases(j)
        For i = 1 To 9: Cells(4 + j, 6 + i) = myVar(i - 1): Next i
        Cells(4 + j, 17) = myVar(9)
    Next j
End Sub
Run Code Online (Sandbox Code Playgroud)

总结
从结果表中,我们可以看到该Dictionary方法对于小于约500,000的情况非常有效,但在此之后,IndexMethod真正开始占主导地位.您会注意到,当顺序无关紧要并且您的数据由正整数组成时,不会与IndexMethod算法进行比较(它会在不到1秒的时间内从包含1000万个元素的数组中返回唯一值!令人难以置信! ).下面我列出了在各种情况下首选哪种算法.

案例1
您的数据包含整数(即整数,正数和负数):IndexMethod

案例2
您的数据包含少于200000个元素的非整数(即变量,双精度,字符串等):Dictionary Method

案例3
您的数据包含超过200000个元素的非整数(即变量,双精度,字符串等):Collection Method

如果您必须选择一种算法,在我看来,该Collection方法仍然是最好的,因为它只需要几行代码,它是超级通用的,并且它足够快.

  • 使用`For each a in a ar`而不是`For i = 1 To arr.Count`循环收集整个集合,在我的PC上将Collection2的速度提高了> 700x! (2认同)
  • @ChaimG,我觉得约瑟夫伍德想要的是全面的.我对字典方法的测试(100,000个数组,值0到9,运行100次)给出了类似的结果(大约29秒)w/1或0.""毫不奇怪地花了几乎两倍的时间.vbNull与0或1-可能更快,但可能更快,但Excel崩溃了1000万个阵列.会留给别人进一步调查 - 做太多工作:) (2认同)