GRB*_*GRB 11 excel vba excel-vba trie
我期待实现VBA 特里路技术算法,该算法能够处理的时间(小于15-20秒)相对较短量大幅英语词汇(〜50,000字).由于我是一名C++程序员(这是我第一次做任何实质性的VBA工作),我构建了一个快速概念验证程序,能够在大约半秒钟内完成计算机上的任务.然而,当测试VBA端口的时候,它花了差不多两分钟来做同样的事情 - 为了我的目的,这是一个不可接受的长时间.VBA代码如下:
节点类模块:
Public letter As String
Public next_nodes As New Collection
Public is_word As Boolean
Run Code Online (Sandbox Code Playgroud)
主要模块:
Dim tree As Node
Sub build_trie()
Set tree = New Node
Dim file, a, b, c As Integer
Dim current As Node
Dim wordlist As Collection
Set wordlist = New Collection
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
Do While Not EOF(file)
Dim line As String
Line Input #file, line
wordlist.add line
Loop
For a = 1 To wordlist.Count
Set current = tree
For b = 1 To Len(wordlist.Item(a))
Dim match As Boolean
match = False
Dim char As String
char = Mid(wordlist.Item(a), b, 1)
For c = 1 To current.next_nodes.Count
If char = current.next_nodes.Item(c).letter Then
Set current = current.next_nodes.Item(c)
match = True
Exit For
End If
Next c
If Not match Then
Dim new_node As Node
Set new_node = New Node
new_node.letter = char
current.next_nodes.add new_node
Set current = new_node
End If
Next b
current.is_word = True
Next a
End Sub
Run Code Online (Sandbox Code Playgroud)
我的问题很简单,这个算法可以加快吗?我从一些消息来源看到VBA Collection不如Dictionarys 有效,所以我尝试了Dictionary基于实现的实现,但是花了相同的时间来完成更糟的内存使用(我的计算机上使用了500多MB的RAM) ).正如我所说,我对VBA非常新,因此我对其语法及其整体特征/限制的了解非常有限 - 这就是为什么我不相信这种算法尽可能高效的原因; 任何提示/建议将不胜感激.
提前致谢
注意:代码引用的词典文件"corncob_caps.txt"可以在这里找到(下载"所有CAPS"文件)
chr*_*sen 17
这里有一些小问题和一些更大的机会.你确实说过这是你的第一个vba工作,如果我告诉你你已经知道的事情,请原谅我
首先是小事:
Dim file, a, b, c As Integer声明文件,a和b作为变体. Integer是16位符号,因此可能存在溢出的风险,请Long改用.
DIM内部循环会适得其反:与C++不同,它们不是循环范围的.
真正的机会是:
使用For Each在那里你可以遍历集合:它比索引更快.
在我的硬件上,您的原始代码在大约160秒内运行.这段代码大约2.5s(加上时间加载word文件进入集合,大约4s)
Sub build_trie()
Dim t1 As Long
Dim wd As Variant
Dim nd As Node
Set tree = New Node
' Dim file, a, b, c As Integer : declares file, a, b as variant
Dim file As Integer, a As Long, b As Long, c As Long ' Integer is 16 bit signed
Dim current As Node
Dim wordlist As Collection
Set wordlist = New Collection
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
' no point in doing inside loop, they are not scoped to the loop
Dim line As String
Dim match As Boolean
Dim char As String
Dim new_node As Node
Do While Not EOF(file)
'Dim line As String
Line Input #file, line
wordlist.Add line
Loop
t1 = GetTickCount
For Each wd In wordlist ' for each is faster
'For a = 1 To wordlist.Count
Set current = tree
For b = 1 To Len(wd)
'Dim match As Boolean
match = False
'Dim char As String
char = Mid$(wd, b, 1)
For Each nd In current.next_nodes
'For c = 1 To current.next_nodes.Count
If char = nd.letter Then
'If char = current.next_nodes.Item(c).letter Then
Set current = nd
'Set current = current.next_nodes.Item(c)
match = True
Exit For
End If
Next nd
If Not match Then
'Dim new_node As Node
Set new_node = New Node
new_node.letter = char
current.next_nodes.Add new_node
Set current = new_node
End If
Next b
current.is_word = True
Next wd
Debug.Print "Time = " & GetTickCount - t1 & " ms"
End Sub
Run Code Online (Sandbox Code Playgroud)
编辑:
将单词列表加载到动态数组中会将加载时间减少到亚秒级.请注意,Redim Preserve价格昂贵,因此需要大量保存
Dim i As Long, sz As Long
sz = 10000
Dim wordlist() As String
ReDim wordlist(0 To sz)
file = FreeFile
Open "C:\corncob_caps.txt" For Input As file
i = 0
Do While Not EOF(file)
'Dim line As String
Line Input #file, line
wordlist(i) = line
i = i + 1
If i > sz Then
sz = sz + 10000
ReDim Preserve wordlist(0 To sz)
End If
'wordlist.Add line
Loop
ReDim Preserve wordlist(0 To i - 1)
Run Code Online (Sandbox Code Playgroud)
然后循环通过它
For i = 0 To UBound(wordlist)
wd = wordlist(i)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3352 次 |
| 最近记录: |