Tre*_*vor 6 java regex stack-overflow string
我试图使用Scanner的正则表达式来匹配文件中的字符串.正则表达式适用于该行以外的所有内容:
DNA="ITTTAITATIATYAAAYIYI[....]ITYTYITTIYAIAIYIT"
Run Code Online (Sandbox Code Playgroud)
在实际文件中,省略号代表数千个字符.
当读取文件的循环到达包含基数的行时,会发生堆栈溢出错误.
这是循环:
while (scanFile.hasNextLine()) {
final String currentLine = scanFile.findInLine(".*");
System.out.println("trying to match '" + currentLine + "'");
Scanner internalScanner = new Scanner(currentLine);
String matchResult = internalScanner.findInLine(Constants.ANIMAL_INFO_REGEX);
assert matchResult != null : "there's no reason not to find a match";
matches.put(internalScanner.match().group(1), internalScanner.match().group(2));
scanFile.nextLine();
}
Run Code Online (Sandbox Code Playgroud)
和正则表达式:
static final String ANIMAL_INFO_REGEX = "([a-zA-Z]+) *= *\"(([a-zA-Z_.]| |\\.)+)";
Run Code Online (Sandbox Code Playgroud)
这是失败追踪:
java.lang.StackOverflowError
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3360)
at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3362)
at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3362)
at java.util.regex.Pattern$Branch.match(Pattern.java:4131)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4185)
at java.util.regex.Pattern$Loop.match(Pattern.java:4312)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4244)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4095)
...etc (it's all regex).
Run Code Online (Sandbox Code Playgroud)
非常感谢!
尝试这个正则表达式的简化版本,它删除了一些不必要的|
运算符(这可能导致正则表达式引擎进行大量分支)并包括行锚点的开头和结尾。
static final String ANIMAL_INFO_REGEX = "^([a-zA-Z]+) *= *\"([a-zA-Z_. ]+)\"$";
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
2994 次 |
最近记录: |