sp0*_*00m 5 java regex string wildcard filter
我想允许两个主要的通配符?并*过滤我的数据.
这就是我现在正在做的事情(正如我在许多网站上看到的):
public boolean contains(String data, String filter) {
if(data == null || data.isEmpty()) {
return false;
}
String regex = filter.replace(".", "[.]")
.replace("?", ".")
.replace("*", ".*");
return Pattern.matches(regex, data);
}
Run Code Online (Sandbox Code Playgroud)
但是,我们不应该逃避所有其他的正则表达式特殊字符,像|还是(等?而且,也许我们可以保留?,*如果它们之前是\?例如,类似于:
filter.replaceAll("([$|\\[\\]{}(),.+^-])", "\\\\$1") // 1. escape regex special chars, but ?, * and \
.replaceAll("([^\\\\]|^)\\?", "$1.") // 2. replace any ? that isn't preceded by a \ by .
.replaceAll("([^\\\\]|^)\\*", "$1.*") // 3. replace any * that isn't preceded by a \ by .*
.replaceAll("\\\\([^?*]|$)", "\\\\\\\\$1"); // 4. replace any \ that isn't followed by a ? or a * (possibly due to step 2 and 3) by \\
Run Code Online (Sandbox Code Playgroud)
你怎么看待这件事?如果您同意,我是否缺少任何其他正则表达式特殊字符?
编辑#1(考虑到dan1111和m.buettner的建议后):
// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars, but \, ? and *
regex = regex.replaceAll("([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
Run Code Online (Sandbox Code Playgroud)
这个如何?
编辑#2(考虑到dan1111的建议后):
// replace any even number of backslashes by a *
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// reduce redundant wildcards that aren't preceded by a \
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape regexps special chars (if not already escaped by user), but \, ? and *
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace ? that aren't preceded by a \ by .
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace * that aren't preceded by a \ by .*
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
Run Code Online (Sandbox Code Playgroud)
目标在眼前?
这是我最终采用的解决方案(使用Apache Commons Lang库):
public static boolean isFiltered(String data, String filter) {
// no filter: return true
if (StringUtils.isBlank(filter)) {
return true;
}
// a filter but no data: return false
else if (StringUtils.isBlank(data)) {
return false;
}
// a filter and a data:
else {
// case insensitive
data = data.toLowerCase();
filter = filter.toLowerCase();
// .matches() auto-anchors, so add [*] (i.e. "containing")
String regex = "*" + filter + "*";
// replace any pair of backslashes by [*]
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// minimize unescaped redundant wildcards
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape unescaped regexps special chars, but [\], [?] and [*]
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace unescaped [?] by [.]
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace unescaped [*] by [.*]
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
// return whether data matches regex or not
return data.matches(regex);
}
}
Run Code Online (Sandbox Code Playgroud)
非常感谢@dan1111和@m.buettner的宝贵帮助;)
| 归档时间: |
|
| 查看次数: |
778 次 |
| 最近记录: |