适用于Java的shlex替代品

Geo*_*Geo 12 java bash shell tokenize

是否有Java 的shlex替代品?我希望能够分割引号分隔的字符串,就像shell会处理它们一样.例如,如果我发送:

one two "three four"
并执行拆分,我想收到令牌
one
two
three four

Ray*_*ers 8

我今天遇到了类似的问题,看起来并不像StringTokenizer,StrTokenizer,Scanner这样的标准选项.但是,实现基础并不难.

此示例处理当前在其他答案上发表评论的所有边缘案例.请注意,我还没有检查它是否符合完整的POSIX标准.Gist包括在GitHub上提供的单元测试- 通过unlicense在公共领域发布.

public List<String> shellSplit(CharSequence string) {
    List<String> tokens = new ArrayList<String>();
    boolean escaping = false;
    char quoteChar = ' ';
    boolean quoting = false;
    int lastCloseQuoteIndex = Integer.MIN_VALUE;
    StringBuilder current = new StringBuilder();
    for (int i = 0; i<string.length(); i++) {
        char c = string.charAt(i);
        if (escaping) {
            current.append(c);
            escaping = false;
        } else if (c == '\\' && !(quoting && quoteChar == '\'')) {
            escaping = true;
        } else if (quoting && c == quoteChar) {
            quoting = false;
            lastCloseQuoteIndex = i;
        } else if (!quoting && (c == '\'' || c == '"')) {
            quoting = true;
            quoteChar = c;
        } else if (!quoting && Character.isWhitespace(c)) {
            if (current.length() > 0 || lastCloseQuoteIndex == (i - 1)) {
                tokens.add(current.toString());
                current = new StringBuilder();
            }
        } else {
            current.append(c);
        }
    }
    if (current.length() > 0 || lastCloseQuoteIndex == (string.length() - 1)) {
        tokens.add(current.toString());
    }

    return tokens;
}
Run Code Online (Sandbox Code Playgroud)


Chs*_*y76 6

看看Apache Commons Lang:

org.apache.commons.lang.text.StrTokenizer应该能够做你想要的:

new StringTokenizer("one two \"three four\"", ' ', '"').getTokenArray();

  • 不幸的是,与`shlex`不同,commons.lang不兼容POSIX.`( - >(StrTokenizer."\"foo \"'bar'baz")(.getTokenList))`返回一个包含`"foo"'bar'baz`的条目,而不是(正确的)`foobarbaz` . (2认同)