Java正则表达式总是失败

Zom*_*m-B 12 java regex unicode

我有一个Java正则表达式模式和一个我想完全匹配的句子,但对于一些句子它错误地失败了.为什么是这样?(为简单起见,我不会使用复杂的正则表达式,只是".*")

System.out.println(Pattern.matches(".*", "asdf"));
System.out.println(Pattern.matches(".*", "[11:04:34] <@Aimbotter> 1 more thing"));
System.out.println(Pattern.matches(".*", "[11:04:35] <@Aimbotter> Dialogue: 0,0:00:00.00,0:00:00.00,Default,{Orginal LV,0000,0000,0000,,[???]??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????} "));
System.out.println(Pattern.matches(".*", "[11:04:35] <@Aimbotter> Dialogue: 0,0:00:00.00,0:00:00.00,Default,{Orginal LV,0000,0000,0000,,[???]????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????} "));
Run Code Online (Sandbox Code Playgroud)

输出:

true
true
true
false
Run Code Online (Sandbox Code Playgroud)

请注意,第四个句子在问号之间包含10个unicode控制字符,这些字符在普通字体中不显示.第三和第四句实际上包含相同数量的字符!

rur*_*uni 13

使用

Pattern.compile(".*",Pattern.DOTALL)
Run Code Online (Sandbox Code Playgroud)

如果你想 .匹配控制字符.默认情况下,它仅匹配可打印字符.

来自JavaDoc:

"在dotall模式下,表达式.匹配任何字符,包括行终止符.默认情况下,此表达式与行终止符不匹配.

也可以通过嵌入式标志表达式(?s)启用Dotall模式.(s是"单行"模式的助记符,这是在Perl中调用的.)"

模式中的代码(有你的\ u0085):

/**
 * Implements the Unicode category ALL and the dot metacharacter when
 * in dotall mode.
 */
static final class All extends CharProperty {
boolean isSatisfiedBy(int ch) {
    return true;
}
}

/**
 * Node class for the dot metacharacter when dotall is not enabled.
 */
static final class Dot extends CharProperty {
boolean isSatisfiedBy(int ch) {
    return (ch != '\n' && ch != '\r'
                && (ch|1) != '\u2029'
                && ch != '\u0085');
    }
}
Run Code Online (Sandbox Code Playgroud)