在java正则表达式中组合白名单和黑名单

Question

在java正则表达式中组合白名单和黑名单

我在构建一个正则表达式时遇到问题，该正则表达式将允许所有 UTF-8 字符，但 2 个字符除外：'_' 和 '?'

所以白名单是：^[\u0000-\uFFFF]，黑名单是：^[^_%]

我需要将这些组合成一个表达式。

我尝试了以下代码，但没有按我希望的方式工作：

    String input = "this";
    Pattern p = Pattern
            .compile("^[\u0000-\uFFFF]+$ | ^[^_%]");
    Matcher m = p.matcher(input);
    boolean result = m.matches();
    System.out.println(result);

Run Code Online (Sandbox Code Playgroud)

输入：这个
实际输出：假
期望输出：真

Answer 1

Bra*_*raj 2

以下是使用Lookahead 和 Lookbehind 零长度断言从范围中排除某些字符的示例代码，这些断言实际上不消耗字符串中的字符，而仅断言是否可能匹配。

示例代码：（从范围中排除m和）na-z

    String str = "abcdmnxyz";
    Pattern p=Pattern.compile("(?![mn])[a-z]");
    Matcher m=p.matcher(str);
    while(m.find()){
        System.out.println(m.group());
    }

Run Code Online (Sandbox Code Playgroud)

输出：

a b c d x y z

Run Code Online (Sandbox Code Playgroud)

用同样的方法你也可以做到。

正则表达式解释(?![mn])[a-z]

  (?!                      look ahead to see if there is not:   
    [mn]                     any character of: 'm', 'n' 
  )                        end of look-ahead    
  [a-z]                    any character of: 'a' to 'z'

Run Code Online (Sandbox Code Playgroud)

您可以将整个范围划分为子范围，也可以使用([a-l]|[o-z])或[a-lo-z]正则表达式解决上述问题。

有一种更好的方法使用字符集交集。 (2认同)

归档时间：	9 年，9 月前
查看次数：	3464 次
最近记录：	9 年，9 月前