DFA最小化

Rus*_*lan 4 regex lexer dfa nfa

我有一个关于DFA最小化的问题.所以我使用了众所周知的技术将正则表达式转换为NFA,然后使用goto/closure算法从中构造DFA.现在问题是如何最小化它?我在这里看过关于它的文章:http://www.youtube.com/watch? v = T9Z66NF5YRk ,我仍然无法理解.什么是DFA最小化?这只是合并IDENTICAL状态(状态在相同的字符上进入相同的状态)还是不同的东西?

所以,我开始使用以下语法:

%digit = '0'..'9'
%letter = 'a'..'z' | 'A'..'Z'
%exponent = ("e" | "E") ("+" | "-")? digit+

T_INT = digit+
T_FLOAT = T_INT exponent
T_IDENTIFIER = (letter | "$" | "_") (letter | "$" | "_" | digit)*
Run Code Online (Sandbox Code Playgroud)

最终得到以下DFA(表示为JSON):

{
    "START": [{
        "type": "range",
        "from": 36,
        "to": 36,
        "shift": "1"
    }, {
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "2"
    }, {
        "type": "range",
        "from": 65,
        "to": 90,
        "shift": "1"
    }, {
        "type": "range",
        "from": 95,
        "to": 95,
        "shift": "1"
    }, {
        "type": "range",
        "from": 97,
        "to": 122,
        "shift": "1"
    }],
    "1": [{
        "type": "range",
        "from": 36,
        "to": 36,
        "shift": "1"
    }, {
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "1"
    }, {
        "type": "range",
        "from": 65,
        "to": 90,
        "shift": "1"
    }, {
        "type": "range",
        "from": 95,
        "to": 95,
        "shift": "1"
    }, {
        "type": "range",
        "from": 97,
        "to": 122,
        "shift": "1"
    }, {
        "shift": ["t_identifier"]
    }],
    "2": [{
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "2"
    }, {
        "type": "range",
        "from": 69,
        "to": 69,
        "shift": "3"
    }, {
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "3"
    }, {
        "shift": ["t_int"]
    }],
    "3": [{
        "type": "range",
        "from": 43,
        "to": 43,
        "shift": "5"
    }, {
        "type": "range",
        "from": 45,
        "to": 45,
        "shift": "5"
    }, {
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "4"
    }],
    "4": [{
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "4"
    }, {
        "shift": ["t_float"]
    }],
    "5": [{
        "type": "range",
        "from": 48,
        "to": 57,
        "shift": "4"
    }]
}
Run Code Online (Sandbox Code Playgroud)

那么我该如何最小化呢?

更新:

好的,这是我的算法.鉴于以下DFA:

{
    0: [{
        from: 97,
        to: 97,
        shift: 1
    }],
    1: [{
        from: 97,
        to: 97,
        shift: 3
    }, {
        from: 98,
        to: 98,
        shift: 2
    }],
    2: [{
        from: 98,
        to: 98,
        shift: 4
    }],
    3: [{
        from: 97,
        to: 97,
        shift: 3
    }, {
        from: 98,
        to: 98,
        shift: 4
    }],
    4: [{
        from: 98,
        to: 98,
        shift: 4
    }]
}
Run Code Online (Sandbox Code Playgroud)

这是我做的最小化它:

  1. 对于每个状态(在我的示例中编号为0,1,2,3,4)获取标识此类状态的唯一散列(例如,对于state0,这将是:from = 97,to = 97,shift = 1,对于state1 this将是:from = 97,to = 97,shift = 3&from = 98,to = 98,shift = 2等等......)

  2. 比较获得的哈希值,如果找到两个相同的哈希值,则合并它.在我的例子中,state2哈希将是:from = 98&shift = 4&to = 98,state4 hash将是:from = 98&shift = 4&to = 98,它们是相同的,所以我们可以将它们合并到新的state5中,之后DFA将看起来像这样:

    {
    0: [{
        from: 97,
        to: 97,
        shift: 1
    }],
    1: [{
        from: 97,
        to: 97,
        shift: 3
    }, {
        from: 98,
        to: 98,
        shift: 5
    }],
    3: [{
        from: 97,
        to: 97,
        shift: 3
    }, {
        from: 98,
        to: 98,
        shift: 5
    }],
    5: [{
        from: 98,
        to: 98,
        shift: 5
    }]
    
    Run Code Online (Sandbox Code Playgroud)

    }

  3. 继续这个'直到我们没有变化,所以下一步(合并状态1和3)将DFA转换为以下形式:

    {
    0: [{
        from: 97,
        to: 97,
        shift: 6
    }],
    6: [{
        from: 97,
        to: 97,
        shift: 6
    }, {
        from: 98,
        to: 98,
        shift: 5
    }],
    5: [{
        from: 98,
        to: 98,
        shift: 5
    }]
    
    Run Code Online (Sandbox Code Playgroud)

    }

  4. 没有更多相同的状态,这意味着我们已经完成了.

第二次更新:

好的,所以给出以下正则表达式:'a'('ce')*('d'|'xa'|''AFe')+ | 'b'('ce')*('d'|'xa'|'AFe')+'ce'+我有以下DFA(START - >开始状态,["接受"] - >所以说过渡到接受状态):

{
    "START": [{
        "type": "range",
        "from": 98,
        "to": 98,
        "shift": "1.2"
    }, {
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "17.18"
    }],
    "1.2": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "10"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "6.7"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "8"
    }, {
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "4"
    }],
    "10": [{
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "6.7"
    }],
    "6.7": [{
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "15"
    }, {
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "13"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "6.7"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "11"
    }],
    "15": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "14.accept"
    }],
    "14.accept": [{
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "16"
    }, {
        "shift": ["accept"]
    }],
    "16": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "14.accept"
    }],
    "13": [{
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "6.7"
    }],
    "11": [{
        "type": "range",
        "from": 70,
        "to": 70,
        "shift": "12"
    }],
    "12": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "6.7"
    }],
    "8": [{
        "type": "range",
        "from": 70,
        "to": 70,
        "shift": "9"
    }],
    "9": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "6.7"
    }],
    "4": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "2.3"
    }],
    "2.3": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "10"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "6.7"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "8"
    }, {
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "5"
    }],
    "5": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "2.3"
    }],
    "17.18": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "25"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "22.accept"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "23"
    }, {
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "20"
    }],
    "25": [{
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "22.accept"
    }],
    "22.accept": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "28"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "22.accept"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "26"
    }, {
        "shift": ["accept"]
    }],
    "28": [{
        "type": "range",
        "from": 97,
        "to": 97,
        "shift": "22.accept"
    }],
    "26": [{
        "type": "range",
        "from": 70,
        "to": 70,
        "shift": "27"
    }],
    "27": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "22.accept"
    }],
    "23": [{
        "type": "range",
        "from": 70,
        "to": 70,
        "shift": "24"
    }],
    "24": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "22.accept"
    }],
    "20": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "18.19"
    }],
    "18.19": [{
        "type": "range",
        "from": 120,
        "to": 120,
        "shift": "25"
    }, {
        "type": "range",
        "from": 100,
        "to": 100,
        "shift": "22.accept"
    }, {
        "type": "range",
        "from": 65,
        "to": 65,
        "shift": "23"
    }, {
        "type": "range",
        "from": 99,
        "to": 99,
        "shift": "21"
    }],
    "21": [{
        "type": "range",
        "from": 101,
        "to": 101,
        "shift": "18.19"
    }]
}
Run Code Online (Sandbox Code Playgroud)

故事是一样的,我该如何最小化它?如果我遵循经典的Hopcroft算法和所有这些表格结构,确定难以区分的状态,将它们合并在一起等等,那么我将得到包含15个状态的DFA(使用此工具:http://regexvisualizer.apphb.com/用这个正则表达式a(ce)(d | xa | AFe)+ | b(ce)(d | xa | AFe)+ ce +来检查).以下是使用Hopcroft算法进行缩小后DFA的样子:

Hopcroft的DFA最小化

在我重新思考Hopcroft的算法之后,我提出的算法构建的DFA小于您上面看到的DFA(对于图像质量而言,我不得不一步一步地重新绘制它以了解为什么它更小):

我的算法

以下是它的工作原理,关于"状态等价"的决定是基于给定状态的哈希函数的结果(例如"START")构建短字符串,如果我们从该状态开始,可以从DFA构造.鉴于上面的DFA和START状态,我们可以构造以下字符串:98-> 120,98-> 100,98-> 65,98-> 99,97-> 120,97-> 100,97-> 65 ,97-> 99所以让它成为START状态的哈希函数的结果.如果我们为DFA中的每个状态运行此函数,我们将看到对于某些状态,此函数给出了相同的结果("1.2","6.7","2.3"和"10","13"和"15" ,"16"和"11","8","26","23"和"12","9","4","5","20","21"和"17.18"," 18.19"AND"25","28"和"27","24")所以我们需要做的就是将这些状态合并在一起.

我发现我在某处错了,但不明白我的算法产生的最小化DFA有什么问题?

小智 5

让您原来的 DFA 称为 M1。简单来说,构造一个最小化的 DFA(称为 M2)意味着将其转换为包含最少状态数的 DFA。因此,M2 中的状态数将少于 M1 中的状态数。这里需要注意的重要一点是 M1 和 M2 必须是等效的,这意味着它们必须定义相同的正则语言。构建最小化的 DFA 不仅涉及寻找相同的状态,还涉及以下内容:

  1. 删除“不可到达”状态:
    这涉及删除对于任何输入字符串从 DFA 的初始状态不可到达的状态。

  2. 删除“死”或“陷阱”状态:
    这涉及删除自行终止的非接受状态。它们也称为 TRAP 状态。

  3. 删除“不可区分”状态:
    这涉及删除对于任何输入字符串都无法相互区分的状态。

还有一些用于 DFA 最小化的流行算法:

了解这些算法可能是值得的!


Gun*_*her 5

您提出的算法不会完全最小化,因为它不会检测行为相同的复杂结构.要理解这个DFA(由JFLAP绘制):

在此输入图像描述

最小化将结合q1和q2,但概述的算法无法管理.

与此相反,Hopcroft的算法最初会像这样分区:

   {q0, q1, q2}, {q3}
Run Code Online (Sandbox Code Playgroud)

然后拆分第一组,因为q0没有转换到q3:

   {q0}, {q1, q2}, {q3}
Run Code Online (Sandbox Code Playgroud)

而不是进一步分裂,因为q1和q2表现相同.