是否有"glob"类型模式的java.util.regex等价物?

Pau*_*lin 80 java glob

是否有标准(最好是Apache Commons或类似的非病毒)库用于在Java中进行"glob"类型的匹配?当我不得不在Perl中做类似的事情时,我只是将所有的" ."改为" \."," *"改为" .*"和" ?"改为" ."等等,但是我想知道是否有人做了为我工作.

类似的问题:从glob表达式创建正则表达式

fin*_*nnw 56

Globbing 也计划在Java 7中实现.

请参阅FileSystem.getPathMatcher(String)"查找文件"教程.

  • 奇妙.但是为什么这个实现仅限于"Path"对象?!?在我的情况下,我想匹配URI ... (21认同)
  • 从sun.nio的源头来看,全局匹配似乎是由[Globs.java]实现的(http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147 /sun/nio/fs/Globs.java)。不幸的是,这是专门为文件系统路径编写的,因此不能用于所有字符串(它对路径分隔符和非法字符进行了一些假设)。但这可能是一个有用的起点。 (2认同)

Dav*_*Ray 42

没有任何内置功能,但将类似glob的内容转换为正则表达式非常简单:

public static String createRegexFromGlob(String glob)
{
    String out = "^";
    for(int i = 0; i < glob.length(); ++i)
    {
        final char c = glob.charAt(i);
        switch(c)
        {
        case '*': out += ".*"; break;
        case '?': out += '.'; break;
        case '.': out += "\\."; break;
        case '\\': out += "\\\\"; break;
        default: out += c;
        }
    }
    out += '$';
    return out;
}
Run Code Online (Sandbox Code Playgroud)

这对我有用,但我不确定它是否涵盖了glob"标准",如果有的话:)

Paul Tomblin更新:我发现了一个执行全局转换的perl程序,并将其调整为Java我最终得到:

    private String convertGlobToRegEx(String line)
    {
    LOG.info("got line [" + line + "]");
    line = line.trim();
    int strLen = line.length();
    StringBuilder sb = new StringBuilder(strLen);
    // Remove beginning and ending * globs because they're useless
    if (line.startsWith("*"))
    {
        line = line.substring(1);
        strLen--;
    }
    if (line.endsWith("*"))
    {
        line = line.substring(0, strLen-1);
        strLen--;
    }
    boolean escaping = false;
    int inCurlies = 0;
    for (char currentChar : line.toCharArray())
    {
        switch (currentChar)
        {
        case '*':
            if (escaping)
                sb.append("\\*");
            else
                sb.append(".*");
            escaping = false;
            break;
        case '?':
            if (escaping)
                sb.append("\\?");
            else
                sb.append('.');
            escaping = false;
            break;
        case '.':
        case '(':
        case ')':
        case '+':
        case '|':
        case '^':
        case '$':
        case '@':
        case '%':
            sb.append('\\');
            sb.append(currentChar);
            escaping = false;
            break;
        case '\\':
            if (escaping)
            {
                sb.append("\\\\");
                escaping = false;
            }
            else
                escaping = true;
            break;
        case '{':
            if (escaping)
            {
                sb.append("\\{");
            }
            else
            {
                sb.append('(');
                inCurlies++;
            }
            escaping = false;
            break;
        case '}':
            if (inCurlies > 0 && !escaping)
            {
                sb.append(')');
                inCurlies--;
            }
            else if (escaping)
                sb.append("\\}");
            else
                sb.append("}");
            escaping = false;
            break;
        case ',':
            if (inCurlies > 0 && !escaping)
            {
                sb.append('|');
            }
            else if (escaping)
                sb.append("\\,");
            else
                sb.append(",");
            break;
        default:
            escaping = false;
            sb.append(currentChar);
        }
    }
    return sb.toString();
}
Run Code Online (Sandbox Code Playgroud)

我正在编辑这个答案,而不是自己创作,因为这个答案让我走上正轨.

  • 仅供参考:'globbing'的标准是POSIX Shell语言 - http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13_01 (9认同)
  • 这不区分`*`和`**`通配字符...... (4认同)

Nei*_*aft 25

感谢大家的贡献.我写了一个比以前的答案更全面的转换:

/**
 * Converts a standard POSIX Shell globbing pattern into a regular expression
 * pattern. The result can be used with the standard {@link java.util.regex} API to
 * recognize strings which match the glob pattern.
 * <p/>
 * See also, the POSIX Shell language:
 * http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13_01
 * 
 * @param pattern A glob pattern.
 * @return A regex pattern to recognize the given glob pattern.
 */
public static final String convertGlobToRegex(String pattern) {
    StringBuilder sb = new StringBuilder(pattern.length());
    int inGroup = 0;
    int inClass = 0;
    int firstIndexInClass = -1;
    char[] arr = pattern.toCharArray();
    for (int i = 0; i < arr.length; i++) {
        char ch = arr[i];
        switch (ch) {
            case '\\':
                if (++i >= arr.length) {
                    sb.append('\\');
                } else {
                    char next = arr[i];
                    switch (next) {
                        case ',':
                            // escape not needed
                            break;
                        case 'Q':
                        case 'E':
                            // extra escape needed
                            sb.append('\\');
                        default:
                            sb.append('\\');
                    }
                    sb.append(next);
                }
                break;
            case '*':
                if (inClass == 0)
                    sb.append(".*");
                else
                    sb.append('*');
                break;
            case '?':
                if (inClass == 0)
                    sb.append('.');
                else
                    sb.append('?');
                break;
            case '[':
                inClass++;
                firstIndexInClass = i+1;
                sb.append('[');
                break;
            case ']':
                inClass--;
                sb.append(']');
                break;
            case '.':
            case '(':
            case ')':
            case '+':
            case '|':
            case '^':
            case '$':
            case '@':
            case '%':
                if (inClass == 0 || (firstIndexInClass == i && ch == '^'))
                    sb.append('\\');
                sb.append(ch);
                break;
            case '!':
                if (firstIndexInClass == i)
                    sb.append('^');
                else
                    sb.append('!');
                break;
            case '{':
                inGroup++;
                sb.append('(');
                break;
            case '}':
                inGroup--;
                sb.append(')');
                break;
            case ',':
                if (inGroup > 0)
                    sb.append('|');
                else
                    sb.append(',');
                break;
            default:
                sb.append(ch);
        }
    }
    return sb.toString();
}
Run Code Online (Sandbox Code Playgroud)

单元测试证明它有效:

/**
 * @author Neil Traft
 */
public class StringUtils_ConvertGlobToRegex_Test {

    @Test
    public void star_becomes_dot_star() throws Exception {
        assertEquals("gl.*b", StringUtils.convertGlobToRegex("gl*b"));
    }

    @Test
    public void escaped_star_is_unchanged() throws Exception {
        assertEquals("gl\\*b", StringUtils.convertGlobToRegex("gl\\*b"));
    }

    @Test
    public void question_mark_becomes_dot() throws Exception {
        assertEquals("gl.b", StringUtils.convertGlobToRegex("gl?b"));
    }

    @Test
    public void escaped_question_mark_is_unchanged() throws Exception {
        assertEquals("gl\\?b", StringUtils.convertGlobToRegex("gl\\?b"));
    }

    @Test
    public void character_classes_dont_need_conversion() throws Exception {
        assertEquals("gl[-o]b", StringUtils.convertGlobToRegex("gl[-o]b"));
    }

    @Test
    public void escaped_classes_are_unchanged() throws Exception {
        assertEquals("gl\\[-o\\]b", StringUtils.convertGlobToRegex("gl\\[-o\\]b"));
    }

    @Test
    public void negation_in_character_classes() throws Exception {
        assertEquals("gl[^a-n!p-z]b", StringUtils.convertGlobToRegex("gl[!a-n!p-z]b"));
    }

    @Test
    public void nested_negation_in_character_classes() throws Exception {
        assertEquals("gl[[^a-n]!p-z]b", StringUtils.convertGlobToRegex("gl[[!a-n]!p-z]b"));
    }

    @Test
    public void escape_carat_if_it_is_the_first_char_in_a_character_class() throws Exception {
        assertEquals("gl[\\^o]b", StringUtils.convertGlobToRegex("gl[^o]b"));
    }

    @Test
    public void metachars_are_escaped() throws Exception {
        assertEquals("gl..*\\.\\(\\)\\+\\|\\^\\$\\@\\%b", StringUtils.convertGlobToRegex("gl?*.()+|^$@%b"));
    }

    @Test
    public void metachars_in_character_classes_dont_need_escaping() throws Exception {
        assertEquals("gl[?*.()+|^$@%]b", StringUtils.convertGlobToRegex("gl[?*.()+|^$@%]b"));
    }

    @Test
    public void escaped_backslash_is_unchanged() throws Exception {
        assertEquals("gl\\\\b", StringUtils.convertGlobToRegex("gl\\\\b"));
    }

    @Test
    public void slashQ_and_slashE_are_escaped() throws Exception {
        assertEquals("\\\\Qglob\\\\E", StringUtils.convertGlobToRegex("\\Qglob\\E"));
    }

    @Test
    public void braces_are_turned_into_groups() throws Exception {
        assertEquals("(glob|regex)", StringUtils.convertGlobToRegex("{glob,regex}"));
    }

    @Test
    public void escaped_braces_are_unchanged() throws Exception {
        assertEquals("\\{glob\\}", StringUtils.convertGlobToRegex("\\{glob\\}"));
    }

    @Test
    public void commas_dont_need_escaping() throws Exception {
        assertEquals("(glob,regex),", StringUtils.convertGlobToRegex("{glob\\,regex},"));
    }

}
Run Code Online (Sandbox Code Playgroud)

  • 我特此同意此答案中的代码属于公共领域。 (3认同)

Ada*_*ent 9

有几个库做类似Glob的模式匹配比现有的更现代:

Theres Ants 目录扫描仪 和弹簧AntPathMatcher

我推荐其他解决方案,因为Ant Style Globbing已经成为Java世界中的标准glob语法(Hudson,Spring,Ant和I think Maven).

  • 以下是 AntPathMatcher 工件的 Maven 坐标:https://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.springframework%22%20AND%20a%3A%22spring-core%22 和一些带有示例用法的测试:https://github.com/spring-projects/spring-framework/blob/master/spring-core/src/test/java/org/springframework/util/AntPathMatcherTests.java (3认同)

Vin*_*ert 6

我最近不得不这样做并使用\Q\E逃避glob模式:

private static Pattern getPatternFromGlob(String glob) {
  return Pattern.compile(
    "^" + Pattern.quote(glob)
            .replace("*", "\\E.*\\Q")
            .replace("?", "\\E.\\Q") 
    + "$");
}
Run Code Online (Sandbox Code Playgroud)

  • 如果在字符串中的某个地方有一个\ E,这不会破坏吗? (4认同)
  • @jmo我已经修复了使用Pattern.quote()的例子. (2认同)

Ton*_*mbe 5

这是一个简单的Glob实现,可以处理*和?在模式中

public class GlobMatch {
    private String text;
    private String pattern;

    public boolean match(String text, String pattern) {
        this.text = text;
        this.pattern = pattern;

        return matchCharacter(0, 0);
    }

    private boolean matchCharacter(int patternIndex, int textIndex) {
        if (patternIndex >= pattern.length()) {
            return false;
        }

        switch(pattern.charAt(patternIndex)) {
            case '?':
                // Match any character
                if (textIndex >= text.length()) {
                    return false;
                }
                break;

            case '*':
                // * at the end of the pattern will match anything
                if (patternIndex + 1 >= pattern.length() || textIndex >= text.length()) {
                    return true;
                }

                // Probe forward to see if we can get a match
                while (textIndex < text.length()) {
                    if (matchCharacter(patternIndex + 1, textIndex)) {
                        return true;
                    }
                    textIndex++;
                }

                return false;

            default:
                if (textIndex >= text.length()) {
                    return false;
                }

                String textChar = text.substring(textIndex, textIndex + 1);
                String patternChar = pattern.substring(patternIndex, patternIndex + 1);

                // Note the match is case insensitive
                if (textChar.compareToIgnoreCase(patternChar) != 0) {
                    return false;
                }
        }

        // End of pattern and text?
        if (patternIndex + 1 >= pattern.length() && textIndex + 1 >= text.length()) {
            return true;
        }

        // Go on to match the next character in the pattern
        return matchCharacter(patternIndex + 1, textIndex + 1);
    }
}
Run Code Online (Sandbox Code Playgroud)


mih*_*ihi 5

Tony Edgecombe回答类似,如果有人需要,这里有一个简短而简单的 globber 支持*并且?不使用正则表达式。

public static boolean matches(String text, String glob) {
    String rest = null;
    int pos = glob.indexOf('*');
    if (pos != -1) {
        rest = glob.substring(pos + 1);
        glob = glob.substring(0, pos);
    }

    if (glob.length() > text.length())
        return false;

    // handle the part up to the first *
    for (int i = 0; i < glob.length(); i++)
        if (glob.charAt(i) != '?' 
                && !glob.substring(i, i + 1).equalsIgnoreCase(text.substring(i, i + 1)))
            return false;

    // recurse for the part after the first *, if any
    if (rest == null) {
        return glob.length() == text.length();
    } else {
        for (int i = glob.length(); i <= text.length(); i++) {
            if (matches(text.substring(i), rest))
                return true;
        }
        return false;
    }
}
Run Code Online (Sandbox Code Playgroud)