Java 16：替换几个字母

Question

Java 16：替换几个字母

0 java string replace java-16

我有一个有点不寻常的问题。我目前正在尝试在 Java 16 中编写一个用于处理不和谐的聊天过滤器。

\n

在这里我遇到了一个问题，在德语中，有几种方法可以编写一个单词来绕过这个过滤器。

\n

作为一个例子，我现在以侮辱“Hurensohn”为例。\n现在你可以简单地在聊天中写“Hur \xc3\xa4 nsohn”或“Hur 3 nsohn”，从而很容易地绕过过滤器。

\n

由于我不想手动将所有可能性打包到过滤器中，所以我考虑了如何自动完成它。所以我做的第一件事是创建一个包含所有可能的替代字母的哈希图，它看起来像这样：

\n

Map<String, List<String>> alternativeCharacters = new HashMap<>();\nalternativeCharacters.put( "E", List.of( "\xc3\xa4", "3" ) );\n

Run Code Online (Sandbox Code Playgroud)\n

我尝试更改单词中对应的字母并将其添加到聊天过滤器中，这确实有效。

\n

但现在我们遇到了问题：\n为了能够覆盖所有可能的组合，仅更改单词中的一种类型的字母对我来说并没有多大好处。

\n

如果我们现在取单词“Einschalter”并更改此处的字母“e”，我们也可以简单地将此处的“e”更改为“3”或“\xc3\xa4”，从而会出现以下结果：

\n

3einschal3r
艾因沙尔特3r
3英寸沙尔特

\n

和

\n

\xc3\x84inschal\xc3\xa4r
艾因沙尔特\xc3\xa4r
\xc3\x84安装

\n

但现在我也想创造“混合”词。例如“3inschal\xc3\xa4r”，其中“\xc3\x84”和“3”都用于创建单词。然后会出现以下组合：

\n

3inschal\xc3\xa4r
\xc3\x84inschalt3r

\n

有谁知道我怎样才能实现这样的事情？使用普通的 Replace() 方法，我还没有找到创建“混合”替换的方法。

\n

我希望人们理解我有什么样的问题以及我想做什么。：D

\n

目前用于替换的方法：

\n

    public static List<String> replace( String word, String from, String... to ) {\n\n        final int[] index = { 0 };\n        List<String> strings = new ArrayList<>();\n\n        /* Replaces all letters */\n        List.of( to ).forEach( value -> strings.add( word.replaceAll( from, value ) ) );\n\n\n        /* Here is the problem. Here only one letter is edited at a time and thus changed in the word */\n        List.of( to ).forEach( value -> {\n            List.of( word.split( "" ) ).forEach( letters -> {\n                if ( letters.equalsIgnoreCase( from ) ) {\n                    strings.add( word.substring( 0, index[0] ) + value + "" + word.substring( index[0] + 1 ) );\n                }\n                index[0]++;\n            } );\n            index[0] = 0;\n        } );\n\n        return strings;\n    }\n

Run Code Online (Sandbox Code Playgroud)\n

Answer 1

Hol*_*ger 6

正如其他人所说，你无法跟上人们的创造力。但如果您想继续使用这样的检查，您应该使用适合该工作的正确工具，即RuleBasedCollator.

\n

RuleBasedCollator c = new RuleBasedCollator("<i,I=1=!<e=\xc3\xa4,E=3=\xc3\x84<o=0,O");\nc.setStrength(Collator.PRIMARY);\n\nString a = "3inschalt\xc3\xa4r", b = "Einschalter";\nif(c.compare(a, b) == 0) {\n   System.out.println(a + " matches " + b);\n}\n

Run Code Online (Sandbox Code Playgroud)\n

3inschalt\xc3\xa4r matches Einschalter\n

Run Code Online (Sandbox Code Playgroud)\n

这个类甚至允许高效的哈希查找

\n

// using c from above\n\n// prepare map\nvar map = new HashMap<CollationKey, String>();\nfor(String s: List.of("Einschalter", "Hicks-Boson")) {\n    map.put(c.getCollationKey(s), s);\n}\n\n// use map for lookup\nfor(String s: List.of("\xc3\x84!nschalt3r", "H1cks-B0sOn")) {\n    System.out.println(s);\n    String match = map.get(c.getCollationKey(s));\n    if(match != null) System.out.println("\\ta variant of " + match);\n}\n

Run Code Online (Sandbox Code Playgroud)\n

\xc3\x84!nschalt3r\n        a variant of Einschalter\nH1cks-B0sOn\n        a variant of Hicks-Boson\n

Run Code Online (Sandbox Code Playgroud)\n

虽然 aCollator可用于排序，但您\xe2\x80\x99 只对识别等于字符串感兴趣。因此，我\xe2\x80\x99不关心指定一个有用的顺序，这简化了规则，因为我们只需要指定应该相等的字符。

\n

链接文档解释了语法；简而言之，I=1=!将字符I、1、和定义!为相等，而前置则i,定义i为与其他字符不同的大小写。同样，e=\xc3\xa4,E=3=\xc3\x84定义e等于且与字符, ,\xc3\xa4的大小写不同。最终，分隔符将字符定义为不同的。它还定义了一个排序顺序，如上所述，我们在这种用法中不关心它。E3\xc3\x84<

\n

作为附录，以下内容可用于删除字符中的重音符号和其他标记（变音符号除外），因为您想要匹配德语单词。这将消除处理爆炸数量的混淆字符组合的要求，特别是对于了解 Zalgo 文本转换器的人来说：

\n

String s = "\xc3\xb2\xc3\xb1\xc4\x99 \xe1\xba\xa3\xc3\xab\xc3\xae\xc3\xb6\xc5\xab";\nString n = Normalizer.normalize(s, Normalizer.Form.NFD)\n    .replaceAll("(?!(?<=[aou])\\u0308)\\\\p{Mn}", "");\nSystem.out.println(s + " -> " + n);\n

Run Code Online (Sandbox Code Playgroud)\n

\xc3\xb2\xc3\xb1\xc4\x99 \xe1\xba\xa3\xc3\xab\xc3\xae\xc3\xb6\xc5\xab -> one aeio\xcc\x88u\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	3 年，6 月前
查看次数：	192 次
最近记录：	3 年，6 月前