Excel VBA性能 - 100万行 - 在不到1分钟的时间内删除包含值的行

pau*_*ica 32 excel performance vba excel-vba

我试图在不到一分钟的时间内找到一种方法来过滤大数据并删除工作表中的行

目标:

  • 在第1列中查找包含特定文本的所有记录,然后删除整行
  • 保持所有单元格格式(颜色,字体,边框,列宽)和公式

.

测试数据:

测试数据:

.

代码如何工作:

  1. 首先关闭所有Excel功能
  2. 如果工作簿不为空,并且要删除的文本值存在于第1列中

    • 将列1的已使用范围复制到数组
    • 向后迭代数组中的每个值
    • 当找到匹配时:

      • 将单元格地址添加到格式的tmp字符串中 "A11,A275,A3900,..."
      • 如果tmp变量长度接近255个字符
      • 使用删除行 .Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
      • 将tmp重置为空并继续前进到下一组行
  3. 最后,它将所有Excel功能重新打开

.

主要问题是删除操作,总持续时间应低于一分钟.任何基于代码的解决方案都是可以接受的,只要它在1分钟内执行即可.

这将范围缩小到极少数可接受的答案.已经提供的答案也非常简短,易于实施.一个人在大约30秒内执行操作,因此至少有一个答案提供了可接受的解决方案,其他人可能会发现它也很有用

.

我的主要初始功能:

Sub DeleteRowsWithValuesStrings()
    Const MAX_SZ As Byte = 240

    Dim i As Long, j As Long, t As Double, ws As Worksheet
    Dim memArr As Variant, max As Long, tmp As String

    Set ws = Worksheets(1)
    max = GetMaxCell(ws.UsedRange).Row
    FastWB True:    t = Timer

    With ws
        If max > 1 Then
            If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then
                memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2
                For i = max To 1 Step -1

                    If memArr(i, 1) = "Test String" Then
                        tmp = tmp & "A" & i & ","
                        If Len(tmp) > MAX_SZ Then
                           .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
                           tmp = vbNullString

                        End If
                    End If

                Next
                If Len(tmp) > 0 Then
                    .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp
                End If
                .Calculate
            End If
        End If
    End With
    FastWB False:   InputBox "Duration: ", "Duration", Timer - t
End Sub
Run Code Online (Sandbox Code Playgroud)

辅助功能(关闭和打开Excel功能):

Public Sub FastWB(Optional ByVal opt As Boolean = True)
    With Application
        .Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic)
        .DisplayAlerts = Not opt
        .DisplayStatusBar = Not opt
        .EnableAnimations = Not opt
        .EnableEvents = Not opt
        .ScreenUpdating = Not opt
    End With
    FastWS , opt
End Sub

Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _
                  Optional ByVal opt As Boolean = True)
    If ws Is Nothing Then
        For Each ws In Application.ActiveWorkbook.Sheets
            EnableWS ws, opt
        Next
    Else
        EnableWS ws, opt
    End If
End Sub

Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean)
    With ws
        .DisplayPageBreaks = False
        .EnableCalculation = Not opt
        .EnableFormatConditionsCalculation = Not opt
        .EnablePivotTable = Not opt
    End With
End Sub
Run Code Online (Sandbox Code Playgroud)

查找包含数据的最后一个单元格(感谢@ZygD - 现在我在几种情况下测试了它):

Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range

    'Returns the last cell containing a value, or A1 if Worksheet is empty

    Const NONEMPTY As String = "*"
    Dim lRow As Range, lCol As Range

    If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange
    If WorksheetFunction.CountA(rng) = 0 Then
        Set GetMaxCell = rng.Parent.Cells(1, 1)
    Else
        With rng
            Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
                                        After:=.Cells(1, 1), _
                                        SearchDirection:=xlPrevious, _
                                        SearchOrder:=xlByRows)
            If Not lRow Is Nothing Then
                Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _
                                            After:=.Cells(1, 1), _
                                            SearchDirection:=xlPrevious, _
                                            SearchOrder:=xlByColumns)

                Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column)
            End If
        End With
    End If
End Function
Run Code Online (Sandbox Code Playgroud)

返回数组中匹配项的索引,如果未找到匹配项,则返回0:

Public Function IndexOfValInRowOrCol( _
                                    ByVal searchVal As String, _
                                    Optional ByRef ws As Worksheet = Nothing, _
                                    Optional ByRef rng As Range = Nothing, _
                                    Optional ByRef vertical As Boolean = True, _
                                    Optional ByRef rowOrColNum As Long = 1 _
                                    ) As Long

    'Returns position in Row or Column, or 0 if no matches found

    Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long

    result = CVErr(9999) '- generate custom error

    Set usedRng = GetUsedRng(ws, rng)
    If Not usedRng Is Nothing Then
        If rowOrColNum < 1 Then rowOrColNum = 1
        With Application
            If vertical Then
                result = .Match(searchVal, rng.Columns(rowOrColNum), 0)
            Else
                result = .Match(searchVal, rng.Rows(rowOrColNum), 0)
            End If
        End With
    End If
    If IsError(result) Then IndexOfValInRowOrCol = 0 Else IndexOfValInRowOrCol = result
End Function
Run Code Online (Sandbox Code Playgroud)

.

更新:

测试了6个解决方案(每个3个测试):Excel Hero的解决方案是迄今为止最快的(删除公式)

.

以下是最快到最慢的结果:

.

测试1.总共100,000条记录,10,000条要删除:

1. ExcelHero()                    - 1.5 seconds

2. DeleteRowsWithValuesNewSheet() - 2.4 seconds

3. DeleteRowsWithValuesStrings()  - 2.45 minutes
4. DeleteRowsWithValuesArray()    - 2.45 minutes
5. QuickAndEasy()                 - 3.25 minutes
6. DeleteRowsWithValuesUnion()    - Stopped after 5 minutes
Run Code Online (Sandbox Code Playgroud)

.

测试2.总共100万条记录,100,000条被删除:

1. ExcelHero()                    - 16 seconds (average)

2. DeleteRowsWithValuesNewSheet() - 33 seconds (average)

3. DeleteRowsWithValuesStrings()  - 4 hrs 38 min (16701.375 sec)
4. DeleteRowsWithValuesArray()    - 4 hrs 37 min (16626.3051757813 sec)
5. QuickAndEasy()                 - 5 hrs 40 min (20434.2104492188 sec)
6. DeleteRowsWithValuesUnion()    - N/A
Run Code Online (Sandbox Code Playgroud)

.

笔记:

  1. ExcelHero方法:易于实现,可靠,速度极快,但删除了公式
  2. NewSheet方法:易于实现,可靠,并符合目标
  3. 字符串方法:更加努力实现,可靠,但不符合要求
  4. 数组方法:类似于字符串,但是ReDims是一个数组(更快版本的Union)
  5. QuickAndEasy:易于实施(简短,可靠和优雅),但不符合要求
  6. Range Union:实现复杂度类似于2和3,但速度太慢

我还通过引入不寻常的值使测试数据更加真实:

  • 空单元格,范围,行和列
  • 特殊字符,如= [`〜!@#$%^&*()_- + {} []\|;:'",.<>/?,单独和多个组合
  • 空格,制表符,空公式,边框,字体和其他单元格格式
  • 带小数的大小数字(= 12.9999999999999 + 0.00000000000000001)
  • 超链接,条件格式规则
  • 内部和外部数据范围的空格式
  • 任何可能导致数据问题的事情

pau*_*ica 14

我提供第一个答案作为参考

如果没有其他选择,其他人可能会发现它很有用

  • 实现结果的最快方法是不使用Delete操作
  • 在100万条记录中,它平均删除了100,000行,平均为33秒

.

Sub DeleteRowsWithValuesNewSheet()  '100K records   10K to delete
                                    'Test 1:        2.40234375 sec
                                    'Test 2:        2.41796875 sec
                                    'Test 3:        2.40234375 sec
                                    '1M records     100K to delete
                                    'Test 1:        32.9140625 sec
                                    'Test 2:        33.1484375 sec
                                    'Test 3:        32.90625   sec
    Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long
    Dim wsName As String, t As Double, oldUsedRng As Range

    FastWB True:    t = Timer

    Set oldWs = Worksheets(1)
    wsName = oldWs.Name

    Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange))

    If oldUsedRng.Rows.Count > 1 Then                           'If sheet is not empty
        Set newWs = Sheets.Add(After:=oldWs)                    'Add new sheet
        With oldUsedRng
            .AutoFilter Field:=1, Criteria1:="<>Test String"
            .Copy                                               'Copy visible data
        End With
        With newWs.Cells
            .PasteSpecial xlPasteColumnWidths
            .PasteSpecial xlPasteAll                            'Paste data on new sheet
            .Cells(1, 1).Select                                 'Deselect paste area
            .Cells(1, 1).Copy                                   'Clear Clipboard
        End With
        oldWs.Delete                                            'Delete old sheet
        newWs.Name = wsName
    End If
    FastWB False:   InputBox "Duration: ", "Duration", Timer - t
End Sub
Run Code Online (Sandbox Code Playgroud)

.

在高层次:

  • 它会创建一个新工作表,并保留对初始工作表的引用
  • 自动筛选搜索文本上的第1列: .AutoFilter Field:=1, Criteria1:="<>Test String"
  • 复制初始工作表中的所有(可见)数据
  • 将列宽,格式和数据粘贴到新工作表
  • 删除初始表格
  • 将新工作表重命名为旧工作表名称

它使用问题中发布的相同帮助函数

AutoFilter使用99%的持续时间

.

到目前为止,我发现了一些限制,第一个可以解决:

  1. 如果初始工作表上有任何隐藏的行,则会取消隐藏它们

    • 需要一个单独的功能来隐藏它们
    • 根据实施情况,可能会显着延长持续时间
  2. VBA相关:

    • 它会更改工作表的代码名称; 其他引用Sheet1的VBA将被破坏(如果有的话)
    • 它删除与初始工作表关联的所有VBA代码(如果有)

.

关于使用像这样的大文件的一些注意事项:

  • 二进制格式(.xlsb)显着减小文件大小(从137 Mb到43 Mb)
  • 非托管条件格式规则可能会导致指数性能问题

    • 注释和数据验证相同
  • 从网络读取文件或数据比使用locall文件慢得多

  • AutoFilter似乎是最好的方法,很好的通话.如果您甚至可以打开1米行的工作表,则必须拥有功能强大的计算机.您可以使用VBE对象模型更改代号.它要求在前端启用"访问VBA对象模型",因此仅适用于您可以控制的计算机. (3认同)

Exc*_*ero 9

如果源数据不包含公式,或者方案允许(或希望)在条件行删除期间将公式转换为硬值,则可以实现速度的显着提高.

以上作为警告,我的解决方案使用范围对象的AdvancedFilter.它的速度大约是DeleteRowsWithValuesNewSheet()的两倍.

Public Sub ExcelHero()
    Dim t#, crit As Range, data As Range, ws As Worksheet
    Dim r&, fc As Range, lc As Range, fr1 As Range, fr2 As Range
    FastWB True
    t = Timer

        Set fc = ActiveSheet.UsedRange.Item(1)
        Set lc = GetMaxCell
        Set data = ActiveSheet.Range(fc, lc)
        Set ws = Sheets.Add
        With data
            Set fr1 = data.Worksheet.Range(fc, fc.Offset(, lc.Column))
            Set fr2 = ws.Range(ws.Cells(fc.Row, fc.Column), ws.Cells(fc.Row, lc.Column))
            With fr2
                fr1.Copy
                .PasteSpecial xlPasteColumnWidths: .PasteSpecial xlPasteAll
                .Item(1).Select
            End With
            Set crit = .Resize(2, 1).Offset(, lc.Column + 1)
            crit = [{"Column 1";"<>Test String"}]
            .AdvancedFilter xlFilterCopy, crit, fr2
            .Worksheet.Delete
        End With

    FastWB False
    r = ws.UsedRange.Rows.Count
    Debug.Print "Rows: " & r & ", Duration: " & Timer - t & " seconds"
End Sub
Run Code Online (Sandbox Code Playgroud)


Gar*_*ent 5

在我的老人戴尔Inspiron 1564(Win 7 Office 2007)上:

Sub QuickAndEasy()
    Dim rng As Range
    Set rng = Range("AA2:AA1000001")
    Range("AB1") = Now
    Application.ScreenUpdating = False
        With rng
            .Formula = "=If(A2=""Test String"",0/0,A2)"
            .Cells.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete
            .Clear
        End With
    Application.ScreenUpdating = True
    Range("AC1") = Now
End Sub
Run Code Online (Sandbox Code Playgroud)

跑了大概10秒钟.我假设AA列可用.

编辑#1:

请注意,此代码将" 计算 " 设置为"手动".如果在允许"帮助"列计算后将计算模式设置为手动,则性能将得到改善.