java中的正则表达式,用于查找重复的连续单词

Question

java中的正则表达式,用于查找重复的连续单词

我把它看作是在字符串中找到重复单词的答案.但是当我使用它时,它会思考This并且is是相同的并删除它is.

正则表达式

"\\b(\\w+)\\b\\s+\\1"

Run Code Online (Sandbox Code Playgroud)

知道为什么会这样吗？

这是我用于重复删除的代码

public static String RemoveDuplicateWords(String input)
{
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); 
    //Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "")
                output = input.replaceFirst(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "")
                output = input.replaceAll(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
    }
    return output;
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

Min*_*amy 21

试试这个:

String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

String input = "your string";
Matcher m = r.matcher(input);
while (m.find()) {
    input = input.replaceAll(m.group(), m.group(1));
}
System.out.println(input);

Run Code Online (Sandbox Code Playgroud)

Java类正则表达式在Pattern类的API文档中得到了很好的解释.添加一些空格以指示正则表达式的不同部分后:

"(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"

\b       match a word boundary
[a-z]+   match a word with one or more characters;
         the parentheses capture the word as a group    
\b       match a word boundary
(?:      indicates a non-capturing group (which starts here)
\s+      match one or more white space characters
\1       is a back reference to the first (captured) group;
         so the word is repeated here
\b       match a word boundary
)+       indicates the end of the non-capturing group and
         allows it to occur one or more times

Run Code Online (Sandbox Code Playgroud)

答案是完美的.虽然时间太长了,你能否详细说明正则表达式部分？ (2认同)

Answer 2

Fah*_*kar 7

你应该用过\b(\w+)\b\s+\b\1\b,点击这里查看结果......

希望这是你想要的......

更新1

好吧,你拥有的输出是

删除重复项后的最后一个字符串

import java.util.regex.*;

public class MyDup {
    public static void main (String args[]) {
    String input="This This is text text another another";
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    System.out.println(m);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "") {
                output = input.replaceFirst(m.group(), m.group(1));
            } else {
                output = output.replaceAll(m.group(), m.group(1));
            }
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "") {
                output = input.replaceAll(m.group(), m.group(1));
            } else {
                output = output.replaceAll(m.group(), m.group(1));
            }
        }
    }
    System.out.println("After removing duplicate the final string is " + output);
}

Run Code Online (Sandbox Code Playgroud)

运行此代码并查看您获得的输出...您的查询将被解决...

注意

在output你用单个单词替换重复......是不是？

当我System.out.println(m.group() + " : " + m.group(1));输入第一个if条件时我得到输出,text text : text即重复项被替换为单个单词.

else
    {
        while (m.find())
        {
            if (output == "") {
                System.out.println(m.group() + " : " + m.group(1));
                output = input.replaceFirst(m.group(), m.group(1));
            } else {

Run Code Online (Sandbox Code Playgroud)

希望你现在得到了什么... :)

祝好运!!!干杯!!!

Answer 3

小智 7

即使出现任意数量,下面的模式也会匹配重复的单词.

Pattern.compile("\\b(\\w+)(\\b\\W+\\b\\1\\b)*", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);

Run Code Online (Sandbox Code Playgroud)

例如,"这是我的朋友朋友"将输出"这是我的朋友"

此外,对于此模式,只有一次使用"while(m.find())"的迭代就足够了.

Answer 4

imb*_*ond 5

\b(\w+)(\b\W+\1\b)*

Run Code Online (Sandbox Code Playgroud)

解释：

\b : Any word boundary <br/>(\w+) : Select any word character (letter, number, underscore)

Run Code Online (Sandbox Code Playgroud)

选择所有单词后，现在是选择常用单词的时候了。

( : Grouping starts<br/>
\b : Any word boundary<br/>
\W+ : Any non-word character<br/>
\1 : Select repeated words<br/>
\b : Un select if it repeated word is joined with another word<br/>
) : Grouping ends

Run Code Online (Sandbox Code Playgroud)

参考：示例

归档时间：	13 年，9 月前
查看次数：	22196 次
最近记录：	6 年，10 月前