如何在Julia中生成与正则表达式匹配的随机字符串？

Question

如何在Julia中生成与正则表达式匹配的随机字符串？

Tho*_*ert 8 regex julia

相关问题：

问题很简单。我找到了许多其他语言的替代品，但在朱莉娅中却找不到：

基于正则表达式的随机文本生成器

也Random.randstring没有考虑Regex作为一个参数。

Answer 1

Lyn*_*ite 5

Julia具有PCRE，这意味着其正则表达式比真正的正则表达式强大得多。并且实际上是完整的。我怀疑围绕此有很多有趣的理论计算机科学。我怀疑由于停顿问题，可能无法完成PCRE的任务。但是，我们仍然可以尝试尝试一堆随机字符串，然后将不匹配的字符串扔掉。对于简单的正则表达式，它可以发挥很大的作用。它不能保证给出答案。

如果有人想要更严格的正则表达式（如Automa.jl所涵盖的正则表达式），则可能可以做得更好，因为您可以一次遍历状态机一次解决它。希望知道Automa.jl的人可以发布自己的答案。

码

using Random: randstring

function rand_matching(regex; max_len=2^16, max_attempts=1000)
    for _ in max_attempts
        str  = randstring(max_len)
        m = match(regex, str)
        if m != nothing
            # rather than return whole string, 
            # just return the shortest bit that matches
            return m.match
        end
    end
    error("Could not find any string that matches regex")
end

Run Code Online (Sandbox Code Playgroud)

演示：

julia> @time rand_matching(r"\d\d")
  0.013517 seconds (34.34 k allocations: 1.998 MiB)
"38"

julia> @time rand_matching(r"\d\d")
  0.001497 seconds (11 allocations: 128.656 KiB)
"44"

julia> @time rand_matching(r"a\d\d")
  0.000670 seconds (11 allocations: 128.656 KiB)
"a19"

julia> @time rand_matching(r"a\d\d")
  0.000775 seconds (11 allocations: 128.656 KiB)
"a83"

julia> @time rand_matching(r"a\d\db")
  0.000670 seconds (11 allocations: 128.656 KiB)
"a44b"

Run Code Online (Sandbox Code Playgroud)

Answer 2

phi*_*ler 5

应该可以使用Automa.jl构建DFA并随机遍历它。Automa使用比PCRE更简单的语法，因此您可以描述的languange实际上应该是规则的。

我主要根据以下代码快速汇总了以下内容dot.jl：

julia> function rand_re(machine::Automa.Machine)
           out = IOBuffer()
           node = machine.start

           while true
               if node.state ? machine.final_states
                   (rand() ? 1 / (length(node.edges) + 1)) && break
               end

               edge, node = rand(node.edges)
               label = rand(collect(edge.labels))
               print(out, Char(label))
           end

           return String(take!(out))
       end
rand_re (generic function with 1 method)

julia> rand_re(Automa.compile(re"a[0-9][ab]+"))
"a6bbb"

julia> rand_re(Automa.compile(re"a[0-9][ab]+"))
"a9b"

julia> rand_re(Automa.compile(re"a[0-9][ab]+"))
"a3aa"

julia> rand_re(Automa.compile(re"a[0-9][ab]+"))
"a1a"

julia> rand_re(Automa.compile(re"a[0-9][ab]+"))
"a5ba"

Run Code Online (Sandbox Code Playgroud)

需要注意的是，Automa对边缘标签使用字节编码集，因此在我刚写的地方应该多加注意Char(label)。

由于最终状态仍然可以具有向外的边缘，因此我选择以均匀的概率对待停止和每个边缘。我认为这可能会导致潜在的无限项变得非常短或非常长。谷歌“ Boltzmann采样器”介绍了如何解决该问题（不要与从Boltzmann分布中进行采样相混淆！），但是该解决方案在数学上相当复杂。

归档时间：	5 年，9 月前
查看次数：	127 次
最近记录：	5 年，9 月前