我今天遇到了类似的问题,看起来并不像StringTokenizer,StrTokenizer,Scanner这样的标准选项.但是,实现基础并不难.
此示例处理当前在其他答案上发表评论的所有边缘案例.请注意,我还没有检查它是否符合完整的POSIX标准.Gist包括在GitHub上提供的单元测试- 通过unlicense在公共领域发布.
public List<String> shellSplit(CharSequence string) {
List<String> tokens = new ArrayList<String>();
boolean escaping = false;
char quoteChar = ' ';
boolean quoting = false;
int lastCloseQuoteIndex = Integer.MIN_VALUE;
StringBuilder current = new StringBuilder();
for (int i = 0; i<string.length(); i++) {
char c = string.charAt(i);
if (escaping) {
current.append(c);
escaping = false;
} else if (c == '\\' && !(quoting && quoteChar == '\'')) {
escaping = true;
} else if (quoting && c == quoteChar) {
quoting = false;
lastCloseQuoteIndex = i;
} else if (!quoting && (c == '\'' || c == '"')) {
quoting = true;
quoteChar = c;
} else if (!quoting && Character.isWhitespace(c)) {
if (current.length() > 0 || lastCloseQuoteIndex == (i - 1)) {
tokens.add(current.toString());
current = new StringBuilder();
}
} else {
current.append(c);
}
}
if (current.length() > 0 || lastCloseQuoteIndex == (string.length() - 1)) {
tokens.add(current.toString());
}
return tokens;
}
Run Code Online (Sandbox Code Playgroud)
org.apache.commons.lang.text.StrTokenizer应该能够做你想要的:
new StringTokenizer("one two \"three four\"", ' ', '"').getTokenArray();