Java正则表达式优化提示

dpk*_*kcv 11 java regex

我是Java正则表达式的新手.我们使用模式匹配字符串.我们使用它来验证文本字段,它符合我们的要求.但是匹配中存在性能问题.

图案: ([a-zA-Z0-9]+[ ]?(([_\-][a-zA-Z0-9 ])*)?[_\-]?)+

  1. 输入文本应以a-zA-Z0-9开头.
  2. 单词之间允许空格(单个)
  3. 允许使用"_"和" - "但不能连续.

我们的问题是,对于某些输入字符串,CPU时间变高并导致挂起线程.我们也得到例外.任何人都可以帮助我优化模式或建议一个新的模式来解决我的问题.

Exception details                              
============================================                           
Hung thread details, all the same:
[9/28/11 11:40:07:320 CDT] 00000003 ThreadMonitor W   WSVR0605W: Thread "WebContainer : 26" (0000004f) has been active for 709755 mi
lliseconds and may be hung.  There is/are 1 thread(s) in total in the server that may be hung.
        at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3938)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3801)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
        at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
        at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
        at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3801)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
        at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
        at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
        at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
        at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
        at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
        at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
        at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
        at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
        at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4323)
        at java.util.regex.Pattern$Prolog.match(Pattern.java:4263)
        at java.util.regex.Matcher.match(Matcher.java:1139)
        at java.util.regex.Matcher.matches(Matcher.java:514)
Run Code Online (Sandbox Code Playgroud)

Tim*_*ker 17

你的问题是灾难性的回溯,因为你有嵌套的量词.当文本与要求不匹配时,这开始成为一个问题,因为正则表达式引擎必须通过指数增加的排列数来声明失败.

([a-zA-Z0-9]+[ ]?(([_\-][a-zA-Z0-9 ])*)?[_\-]?)+
                                     ^         ^
                                     |         repetition
                                     repetition
Run Code Online (Sandbox Code Playgroud)

像这样重建你的正则表达式:

(?i)^(?!.*(?:--|__))[A-Z0-9][\w-]*(?: [\w-]+)*$
Run Code Online (Sandbox Code Playgroud)

Java,有解释:

boolean foundMatch = subjectString.matches(
    "(?ix)      # Case-insensitive, multiline regex:\n" +
    "^          # Start of string\n" +
    "(?!        # Assert that it's impossible to match\n" +
    " .*        # any number of characters\n" +
    " (?:--|__) # followed by -- or __\n" +
    ")          # End of lookahead assertion\n" +
    "[A-Z0-9]   # Match A-Z, a-z or 0-9\n" +
    "[\\w-]*    # Match 0 or more alnums/dash\n" +
    "(?:        # Match the following:\n" +
    " [\\ ]     # a single space\n" +
    " [\\w-]+   # followed by one or more alnums or -\n" +
    ")*         # any number of times\n" +
    "$          # End of string");
Run Code Online (Sandbox Code Playgroud)

请注意,字符串不得根据您的要求在空格中结束.万一你想知道,\w是一个简写[A-Za-z0-9_].

  • 在regexp中评论+1.我不知道这个功能,也会使用它! (5认同)

Boh*_*ian 6

你的正则表达式允许一种称为灾难性回溯的现象.

按照链接为一个完整的描述,但简要地说你有可选的匹配组合,这意味着评估必须保留通过前面的字符组合都回去,从而导致n!操作(我很肯定n!),这将迅速吹向你的筹码.

试试这个正则表达式:

^(?!.*(__|--|  ))[a-zA-Z0-9][\w -]*(?<! )$
Run Code Online (Sandbox Code Playgroud)

说明:

  • ^(?!.*(__|--| ))意味着整个输入不得包含2个相邻_-空格(更好的方式表达"单词之间最多一个空格" - 忘记单词 - 检查空格)
  • [a-zA-Z0-9][\w -]*意思是必须在开头有字母或数字,其余可以是字母,数字,下划线(\w = [a-zA-Z0-9_]),空格和短划线的任意组合(鉴于上述两个附带条件)
  • [^ ]$意味着不是在一个空间中结束(没有说明,但似乎是合理的 - 像-你喜欢的那样在字符类中添加其他字符- 但如果使用的话必须是第一个或最后一个)

这个正则表达式不会导致灾难性的回溯.