java中的正则表达式,用于查找重复的连续单词

use*_*265 12 java regex

我把它看作是在字符串中找到重复单词的答案.但是当我使用它时,它会思考This并且is是相同的并删除它is.

正则表达式

"\\b(\\w+)\\b\\s+\\1"
Run Code Online (Sandbox Code Playgroud)

知道为什么会这样吗?

这是我用于重复删除的代码

public static String RemoveDuplicateWords(String input)
{
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); 
    //Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "")
                output = input.replaceFirst(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "")
                output = input.replaceAll(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
    }
    return output;
}
Run Code Online (Sandbox Code Playgroud)

Min*_*amy 21

试试这个:

String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

String input = "your string";
Matcher m = r.matcher(input);
while (m.find()) {
    input = input.replaceAll(m.group(), m.group(1));
}
System.out.println(input);
Run Code Online (Sandbox Code Playgroud)

Java类正则表达式在Pattern类API文档中得到了很好的解释.添加一些空格以指示正则表达式的不同部分后:

"(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"

\b       match a word boundary
[a-z]+   match a word with one or more characters;
         the parentheses capture the word as a group    
\b       match a word boundary
(?:      indicates a non-capturing group (which starts here)
\s+      match one or more white space characters
\1       is a back reference to the first (captured) group;
         so the word is repeated here
\b       match a word boundary
)+       indicates the end of the non-capturing group and
         allows it to occur one or more times
Run Code Online (Sandbox Code Playgroud)

  • 答案是完美的.虽然时间太长了,你能否详细说明正则表达式部分? (2认同)

Fah*_*kar 7

你应该用过\b(\w+)\b\s+\b\1\b,点击这里查看结果......

希望这是你想要的......

更新1

好吧,你拥有的输出是

删除重复项后的最后一个字符串

import java.util.regex.*;

public class MyDup {
    public static void main (String args[]) {
    String input="This This is text text another another";
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    System.out.println(m);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "") {
                output = input.replaceFirst(m.group(), m.group(1));
            } else {
                output = output.replaceAll(m.group(), m.group(1));
            }
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "") {
                output = input.replaceAll(m.group(), m.group(1));
            } else {
                output = output.replaceAll(m.group(), m.group(1));
            }
        }
    }
    System.out.println("After removing duplicate the final string is " + output);
}
Run Code Online (Sandbox Code Playgroud)

运行此代码并查看您获得的输出...您的查询将被解决...

注意

output你用单个单词替换重复......是不是?

当我System.out.println(m.group() + " : " + m.group(1));输入第一个if条件时我得到输出,text text : text即重复项被替换为单个单词.

else
    {
        while (m.find())
        {
            if (output == "") {
                System.out.println(m.group() + " : " + m.group(1));
                output = input.replaceFirst(m.group(), m.group(1));
            } else {
Run Code Online (Sandbox Code Playgroud)

希望你现在得到了什么... :)

祝好运!!!干杯!!!


小智 7

即使出现任意数量,下面的模式也会匹配重复的单词.

Pattern.compile("\\b(\\w+)(\\b\\W+\\b\\1\\b)*", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); 
Run Code Online (Sandbox Code Playgroud)

例如,"这是我的朋友朋友"将输出"这是我的朋友"

此外,对于此模式,只有一次使用"while(m.find())"的迭代就足够了.


imb*_*ond 5

\b(\w+)(\b\W+\1\b)*
Run Code Online (Sandbox Code Playgroud)

解释:

\b : Any word boundary <br/>(\w+) : Select any word character (letter, number, underscore)
Run Code Online (Sandbox Code Playgroud)

选择所有单词后,现在是选择常用单词的时候了。

( : Grouping starts<br/>
\b : Any word boundary<br/>
\W+ : Any non-word character<br/>
\1 : Select repeated words<br/>
\b : Un select if it repeated word is joined with another word<br/>
) : Grouping ends
Run Code Online (Sandbox Code Playgroud)

参考:示例