正则表达式最佳实践

Question

正则表达式最佳实践

我正在学习如何使用正则表达式:

我正在阅读一个文本文件,该文件被分成两个不同类别的部分,由<:==]:>和划分 <:==}:>.我需要知道每个部分是否是一个]或},所以我不能只是做

pattern.compile("<:==]:>|<:==}:>"); pattern.split(text)

Run Code Online (Sandbox Code Playgroud)

这样做:

pattern.compile("<:=="); pattern.split(text)

Run Code Online (Sandbox Code Playgroud)

工作,然后我可以看看每个子字符串中的第一个字符,但这对我来说似乎很草率,我想我只是诉诸它,因为我没有完全掌握我需要掌握的关于正则表达式的东西:

这里最好的做法是什么？另外,有没有办法将字符串分开,同时在结果字符串中留下分隔符 - 这样每个字符串都以分隔符开头？

编辑:文件布局如下:

Old McDonald had a farm 
<:==}:> 
EIEIO. And on that farm he had a cow 
<:==]:> 
And on that farm he....

Run Code Online (Sandbox Code Playgroud)

Answer 1

Tim*_*ker 6

不使用它可能是一个更好的主意split().你可以做一个匹配:

List<String> delimList = new ArrayList<String>();
List<String> sectionList = new ArrayList<String>();
Pattern regex = Pattern.compile(
    "(<:==[\\]}]:>)     # Match a delimiter, capture it in group 1.\n" +
    "(                  # Match and capture in group 2:\n" +
    " (?:               # the following group which matches...\n" +
    "  (?!<:==[\\]}]:>) # (unless we're at the start of another delimiter)\n" +
    "  .                # any character\n" +
    " )*                # any number of times.\n" +
    ")                  # End of group 2", 
    Pattern.COMMENTS | Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    delimList.add(regexMatcher.group(1));
    sectionList.add(regexMatcher.group(2));
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，12 月前
查看次数：	2432 次
最近记录：	11 年，12 月前