我是Java正则表达式的新手.我们使用模式匹配字符串.我们使用它来验证文本字段,它符合我们的要求.但是匹配中存在性能问题.
图案: ([a-zA-Z0-9]+[ ]?(([_\-][a-zA-Z0-9 ])*)?[_\-]?)+
我们的问题是,对于某些输入字符串,CPU时间变高并导致挂起线程.我们也得到例外.任何人都可以帮助我优化模式或建议一个新的模式来解决我的问题.
Exception details
============================================
Hung thread details, all the same:
[9/28/11 11:40:07:320 CDT] 00000003 ThreadMonitor W WSVR0605W: Thread "WebContainer : 26" (0000004f) has been active for 709755 mi
lliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3938)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3801)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3801)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.match(Pattern.java:4307)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4090)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4239)
at java.util.regex.Pattern$GroupCurly.match0(Pattern.java:4006)
at java.util.regex.Pattern$GroupCurly.match(Pattern.java:3928)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Branch.match(Pattern.java:4124)
at java.util.regex.Pattern$Ques.match(Pattern.java:3703)
at java.util.regex.Pattern$Curly.match0(Pattern.java:3794)
at java.util.regex.Pattern$Curly.match(Pattern.java:3756)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4180)
at java.util.regex.Pattern$Loop.matchInit(Pattern.java:4323)
at java.util.regex.Pattern$Prolog.match(Pattern.java:4263)
at java.util.regex.Matcher.match(Matcher.java:1139)
at java.util.regex.Matcher.matches(Matcher.java:514)
Run Code Online (Sandbox Code Playgroud)
Tim*_*ker 17
你的问题是灾难性的回溯,因为你有嵌套的量词.当文本与要求不匹配时,这开始成为一个问题,因为正则表达式引擎必须通过指数增加的排列数来声明失败.
([a-zA-Z0-9]+[ ]?(([_\-][a-zA-Z0-9 ])*)?[_\-]?)+
^ ^
| repetition
repetition
Run Code Online (Sandbox Code Playgroud)
像这样重建你的正则表达式:
(?i)^(?!.*(?:--|__))[A-Z0-9][\w-]*(?: [\w-]+)*$
Run Code Online (Sandbox Code Playgroud)
Java,有解释:
boolean foundMatch = subjectString.matches(
"(?ix) # Case-insensitive, multiline regex:\n" +
"^ # Start of string\n" +
"(?! # Assert that it's impossible to match\n" +
" .* # any number of characters\n" +
" (?:--|__) # followed by -- or __\n" +
") # End of lookahead assertion\n" +
"[A-Z0-9] # Match A-Z, a-z or 0-9\n" +
"[\\w-]* # Match 0 or more alnums/dash\n" +
"(?: # Match the following:\n" +
" [\\ ] # a single space\n" +
" [\\w-]+ # followed by one or more alnums or -\n" +
")* # any number of times\n" +
"$ # End of string");
Run Code Online (Sandbox Code Playgroud)
请注意,字符串不得根据您的要求在空格中结束.万一你想知道,\w是一个简写[A-Za-z0-9_].
你的正则表达式允许一种称为灾难性回溯的现象.
按照链接为一个完整的描述,但简要地说你有可选的匹配组合,这意味着评估必须保留通过前面的字符组合都回去,从而导致n!操作(我很肯定n!),这将迅速吹向你的筹码.
试试这个正则表达式:
^(?!.*(__|--| ))[a-zA-Z0-9][\w -]*(?<! )$
Run Code Online (Sandbox Code Playgroud)
说明:
^(?!.*(__|--| ))意味着整个输入不得包含2个相邻_或-空格(更好的方式表达"单词之间最多一个空格" - 忘记单词 - 检查空格)[a-zA-Z0-9][\w -]*意思是必须在开头有字母或数字,其余可以是字母,数字,下划线(\w = [a-zA-Z0-9_]),空格和短划线的任意组合(鉴于上述两个附带条件)[^ ]$意味着不是在一个空间中结束(没有说明,但似乎是合理的 - 像-你喜欢的那样在字符类中添加其他字符- 但如果使用的话必须是第一个或最后一个)这个正则表达式不会导致灾难性的回溯.
| 归档时间: |
|
| 查看次数: |
3844 次 |
| 最近记录: |