如何在Lua字符串中迭代单个字符?

gri*_*yvp 82 lua

我在Lua中有一个字符串,想要在其中迭代单个字符.但是没有我尝试过的代码和官方手册只显示如何查找和替换子串:(

str = "abcd"
for char in str do -- error
  print( char )
end

for i = 1, str:len() do
  print( str[ i ] ) -- nil
end
Run Code Online (Sandbox Code Playgroud)

RBe*_*eig 120

在lua 5.1中,您可以通过几种方式迭代字符串的字符.

基本循环是:

for i = 1, #str do
    local c = str:sub(i,i)
    -- do something with c
end

但是使用模式string.gmatch()来获取字符上的迭代器可能更有效:

for c in str:gmatch"." do
    -- do something with c
end

甚至用来string.gsub()为每个char调用一个函数:

str:gsub(".", function(c)
    -- do something with c
end)

在上面的所有内容中,我利用了将string模块设置为所有字符串值的元表的事实,因此可以使用:表示法将其函数称为成员.我也使用(新的5.1,IIRC)#来获取字符串长度.

您的应用程序的最佳答案取决于很多因素,如果性能变得重要,基准测试就是您的朋友.

您可能想要评估为什么需要迭代字符,并查看已绑定到Lua的正则表达式模块之一,或者查看Roberto的lpeg模块,该模块实现了Lua的Parsing Expression Grammers.

  • 更不用说[SciTE的最新版本](http://www.scintilla.org/SciTEDownload.html)(自2.22开始)包括Scintillua,一个基于LPEG的词法分析器,意味着它可以开箱即用,没有重新开始 - 需要编译. (3认同)

Aar*_*ela 11

如果您使用的是Lua 5,请尝试:

for i = 1, string.len(str) do
    print( string.sub(str, i, i) )
end
Run Code Online (Sandbox Code Playgroud)


Ele*_*rix 8

提供的答案中已经有很多好的方法(这里这里这里)。如果速度是您的主要目标,那么您绝对应该考虑通过 Lua 的 C API 来完成这项工作,这比原始 Lua 代码快很多倍。当使用预加载的块(例如加载函数)时,差异并不大,但仍然相当可观。

至于Lua 解决方案,让我分享一下我制作的这个小基准。它涵盖了迄今为止提供的所有答案,并添加了一些优化。不过,要考虑的基本问题是:

您需要迭代字符串中的字符多少次?

  • 如果答案是“一次”,那么您应该查找基准的第一部分(“原始速度”)。
  • 否则,第二部分将提供更精确的估计,因为它将字符串解析到表中,这样迭代起来要快得多。您还应该考虑为此编写一个简单的函数,就像@Jarriz 建议的那样。

这是完整的代码:

-- Setup locals
local str = "Hello World!"
local attempts = 5000000
local reuses = 10 -- For the second part of benchmark: Table values are reused 10 times. Change this according to your needs.
local x, c, elapsed, tbl
-- "Localize" funcs to minimize lookup overhead
local stringbyte, stringchar, stringsub, stringgsub, stringgmatch = string.byte, string.char, string.sub, string.gsub, string.gmatch

print("-----------------------")
print("Raw speed:")
print("-----------------------")

-- Version 1 - string.sub in loop
x = os.clock()
for j = 1, attempts do
    for i = 1, #str do
        c = stringsub(str, i)
    end
end
elapsed = os.clock() - x
print(string.format("V1: elapsed time: %.3f", elapsed))

-- Version 2 - string.gmatch loop
x = os.clock()
for j = 1, attempts do
    for c in stringgmatch(str, ".") do end
end
elapsed = os.clock() - x
print(string.format("V2: elapsed time: %.3f", elapsed))

-- Version 3 - string.gsub callback
x = os.clock()
for j = 1, attempts do
    stringgsub(str, ".", function(c) end)
end
elapsed = os.clock() - x
print(string.format("V3: elapsed time: %.3f", elapsed))

-- For version 4
local str2table = function(str)
    local ret = {}
    for i = 1, #str do
        ret[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
    end
    return ret
end

-- Version 4 - function str2table
x = os.clock()
for j = 1, attempts do
    tbl = str2table(str)
    for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
        c = tbl[i]
    end
end
elapsed = os.clock() - x
print(string.format("V4: elapsed time: %.3f", elapsed))

-- Version 5 - string.byte
x = os.clock()
for j = 1, attempts do
    tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
    for i = 1, #tbl do
        c = tbl[i] -- Note: produces char codes instead of chars.
    end
end
elapsed = os.clock() - x
print(string.format("V5: elapsed time: %.3f", elapsed))

-- Version 5b - string.byte + conversion back to chars
x = os.clock()
for j = 1, attempts do
    tbl = {stringbyte(str, 1, #str)} -- Note: This is about 15% faster than calling string.byte for every character.
    for i = 1, #tbl do
        c = stringchar(tbl[i])
    end
end
elapsed = os.clock() - x
print(string.format("V5b: elapsed time: %.3f", elapsed))

print("-----------------------")
print("Creating cache table ("..reuses.." reuses):")
print("-----------------------")

-- Version 1 - string.sub in loop
x = os.clock()
for k = 1, attempts do
    tbl = {}
    for i = 1, #str do
        tbl[i] = stringsub(str, i) -- Note: This is a lot faster than using table.insert
    end
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V1: elapsed time: %.3f", elapsed))

-- Version 2 - string.gmatch loop
x = os.clock()
for k = 1, attempts do
    tbl = {}
    local tblc = 1 -- Note: This is faster than table.insert
    for c in stringgmatch(str, ".") do
        tbl[tblc] = c
        tblc = tblc + 1
    end
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V2: elapsed time: %.3f", elapsed))

-- Version 3 - string.gsub callback
x = os.clock()
for k = 1, attempts do
    tbl = {}
    local tblc = 1 -- Note: This is faster than table.insert
    stringgsub(str, ".", function(c)
        tbl[tblc] = c
        tblc = tblc + 1
    end)
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V3: elapsed time: %.3f", elapsed))

-- Version 4 - str2table func before loop
x = os.clock()
for k = 1, attempts do
    tbl = str2table(str)
    for j = 1, reuses do
        for i = 1, #tbl do -- Note: This type of loop is a lot faster than "pairs" loop.
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V4: elapsed time: %.3f", elapsed))

-- Version 5 - string.byte to create table
x = os.clock()
for k = 1, attempts do
    tbl = {stringbyte(str,1,#str)}
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V5: elapsed time: %.3f", elapsed))

-- Version 5b - string.byte to create table + string.char loop to convert bytes to chars
x = os.clock()
for k = 1, attempts do
    tbl = {stringbyte(str, 1, #str)}
    for i = 1, #tbl do
        tbl[i] = stringchar(tbl[i])
    end
    for j = 1, reuses do
        for i = 1, #tbl do
            c = tbl[i]
        end
    end
end
elapsed = os.clock() - x
print(string.format("V5b: elapsed time: %.3f", elapsed))
Run Code Online (Sandbox Code Playgroud)

示例输出(Lua 5.3.4,Windows)

-----------------------
Raw speed:
-----------------------
V1: elapsed time: 3.713
V2: elapsed time: 5.089
V3: elapsed time: 5.222
V4: elapsed time: 4.066
V5: elapsed time: 2.627
V5b: elapsed time: 3.627
-----------------------
Creating cache table (10 reuses):
-----------------------
V1: elapsed time: 20.381
V2: elapsed time: 23.913
V3: elapsed time: 25.221
V4: elapsed time: 20.551
V5: elapsed time: 13.473
V5b: elapsed time: 18.046
Run Code Online (Sandbox Code Playgroud)

结果:

就我而言,string.bytestring.sub在原始速度方面最快。当使用缓存表并在每个循环中重用它 10 次时,string.byte即使将字符代码转换回字符(这并不总是必要的,取决于使用情况),该版本也是最快的。

您可能已经注意到,我根据我之前的基准做了一些假设,并将它们应用到代码中:

  1. 如果在循环内使用,库函数应始终本地化,因为它要快得多。
  2. 插入新元素到卢阿表是使用快得多tbl[idx] = valuetable.insert(tbl, value)
  3. 循环使用表for i = 1, #tblfor k, v in pairs(tbl).
  4. 总是喜欢函数调用较少的版本,因为调用本身会增加执行时间。

希望能帮助到你。


Ole*_*kov 6

根据手头的任务,它可能更容易使用string.byte.它也是最快的方法,因为它避免了在Lua中创建新的子字符串,这要归功于每个新字符串的哈希并检查它是否已知.您可以预先计算您查找的符号代码,string.byte以保持可读性和可移植性.

local str = "ab/cd/ef"
local target = string.byte("/")
for idx = 1, #str do
   if str:byte(idx) == target then
      print("Target found at:", idx)
   end
end
Run Code Online (Sandbox Code Playgroud)