我把它看作是在字符串中找到重复单词的答案.但是当我使用它时,它会思考This并且is是相同的并删除它is.
正则表达式
"\\b(\\w+)\\b\\s+\\1"
Run Code Online (Sandbox Code Playgroud)
知道为什么会这样吗?
这是我用于重复删除的代码
public static String RemoveDuplicateWords(String input)
{
String originalText = input;
String output = "";
Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
//Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
if (!m.find())
output = "No duplicates found, no changes made to data";
else
{
while (m.find())
{
if (output == "")
output = input.replaceFirst(m.group(), m.group(1));
else
output = output.replaceAll(m.group(), m.group(1));
}
input = output;
m = p.matcher(input);
while (m.find())
{
output = "";
if (output == "")
output = input.replaceAll(m.group(), m.group(1));
else
output = output.replaceAll(m.group(), m.group(1));
}
}
return output;
}
Run Code Online (Sandbox Code Playgroud)
Min*_*amy 21
试试这个:
String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
String input = "your string";
Matcher m = r.matcher(input);
while (m.find()) {
input = input.replaceAll(m.group(), m.group(1));
}
System.out.println(input);
Run Code Online (Sandbox Code Playgroud)
Java类正则表达式在Pattern类的API文档中得到了很好的解释.添加一些空格以指示正则表达式的不同部分后:
"(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"
\b match a word boundary
[a-z]+ match a word with one or more characters;
the parentheses capture the word as a group
\b match a word boundary
(?: indicates a non-capturing group (which starts here)
\s+ match one or more white space characters
\1 is a back reference to the first (captured) group;
so the word is repeated here
\b match a word boundary
)+ indicates the end of the non-capturing group and
allows it to occur one or more times
Run Code Online (Sandbox Code Playgroud)
你应该用过\b(\w+)\b\s+\b\1\b,点击这里查看结果......
希望这是你想要的......
好吧,你拥有的输出是
import java.util.regex.*;
public class MyDup {
public static void main (String args[]) {
String input="This This is text text another another";
String originalText = input;
String output = "";
Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
System.out.println(m);
if (!m.find())
output = "No duplicates found, no changes made to data";
else
{
while (m.find())
{
if (output == "") {
output = input.replaceFirst(m.group(), m.group(1));
} else {
output = output.replaceAll(m.group(), m.group(1));
}
}
input = output;
m = p.matcher(input);
while (m.find())
{
output = "";
if (output == "") {
output = input.replaceAll(m.group(), m.group(1));
} else {
output = output.replaceAll(m.group(), m.group(1));
}
}
}
System.out.println("After removing duplicate the final string is " + output);
}
Run Code Online (Sandbox Code Playgroud)
运行此代码并查看您获得的输出...您的查询将被解决...
在output你用单个单词替换重复......是不是?
当我System.out.println(m.group() + " : " + m.group(1));输入第一个if条件时我得到输出,text text : text即重复项被替换为单个单词.
else
{
while (m.find())
{
if (output == "") {
System.out.println(m.group() + " : " + m.group(1));
output = input.replaceFirst(m.group(), m.group(1));
} else {
Run Code Online (Sandbox Code Playgroud)
小智 7
即使出现任意数量,下面的模式也会匹配重复的单词.
Pattern.compile("\\b(\\w+)(\\b\\W+\\b\\1\\b)*", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
Run Code Online (Sandbox Code Playgroud)
例如,"这是我的朋友朋友"将输出"这是我的朋友"
此外,对于此模式,只有一次使用"while(m.find())"的迭代就足够了.
\b(\w+)(\b\W+\1\b)*
Run Code Online (Sandbox Code Playgroud)
解释:
\b : Any word boundary <br/>(\w+) : Select any word character (letter, number, underscore)
Run Code Online (Sandbox Code Playgroud)
选择所有单词后,现在是选择常用单词的时候了。
( : Grouping starts<br/>
\b : Any word boundary<br/>
\W+ : Any non-word character<br/>
\1 : Select repeated words<br/>
\b : Un select if it repeated word is joined with another word<br/>
) : Grouping ends
Run Code Online (Sandbox Code Playgroud)
参考:示例