从通配符到正则表达式

sp0*_*00m 5 java regex string wildcard filter

我想允许两个主要的通配符?*过滤我的数据.

这就是我现在正在做的事情(正如我在许多网站上看到的):

public boolean contains(String data, String filter) {
    if(data == null || data.isEmpty()) {
        return false;
    }
    String regex = filter.replace(".", "[.]")
                         .replace("?", ".")
                         .replace("*", ".*");
    return Pattern.matches(regex, data);
}
Run Code Online (Sandbox Code Playgroud)

但是,我们不应该逃避所有其他的正则表达式特殊字符,像|还是(等?而且,也许我们可以保留?,*如果它们之前是\?例如,类似于:

filter.replaceAll("([$|\\[\\]{}(),.+^-])", "\\\\$1") // 1. escape regex special chars, but ?, * and \
      .replaceAll("([^\\\\]|^)\\?", "$1.")           // 2. replace any ? that isn't preceded by a \ by .
      .replaceAll("([^\\\\]|^)\\*", "$1.*")          // 3. replace any * that isn't preceded by a \ by .*
      .replaceAll("\\\\([^?*]|$)", "\\\\\\\\$1");    // 4. replace any \ that isn't followed by a ? or a * (possibly due to step 2 and 3) by \\
Run Code Online (Sandbox Code Playgroud)

你怎么看待这件事?如果您同意,我是否缺少任何其他正则表达式特殊字符?


编辑#1(考虑到dan1111和m.buettner的建议后):

// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars, but \, ? and *
regex = regex.replaceAll("([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
Run Code Online (Sandbox Code Playgroud)

这个如何?


编辑#2(考虑到dan1111的建议后):

// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars (if not already escaped by user), but \, ? and *
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
Run Code Online (Sandbox Code Playgroud)

目标在眼前?

sp0*_*00m 0

这是我最终采用的解决方案(使用Apache Commons Lang库):

public static boolean isFiltered(String data, String filter) {
    // no filter: return true
    if (StringUtils.isBlank(filter)) {
        return true;
    }
    // a filter but no data: return false
    else if (StringUtils.isBlank(data)) {
        return false;
    }
    // a filter and a data:
    else {
        // case insensitive
        data = data.toLowerCase();
        filter = filter.toLowerCase();
        // .matches() auto-anchors, so add [*] (i.e. "containing")
        String regex = "*" + filter + "*";
        // replace any pair of backslashes by [*]
        regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
        // minimize unescaped redundant wildcards
        regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
        // escape unescaped regexps special chars, but [\], [?] and [*]
        regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
        // replace unescaped [?] by [.]
        regex = regex.replaceAll("(?<!\\\\)[?]", ".");
        // replace unescaped [*] by [.*]
        regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
        // return whether data matches regex or not
        return data.matches(regex);
    }
}
Run Code Online (Sandbox Code Playgroud)

非常感谢@dan1111和@m.buettner的宝贵帮助;)