Epa*_*aga 66 java regex string
显然,当我使用正则表达式时,Java的正则表达式将变音符号和其他特殊字符计为非"单词字符".
"TESTÜTEST".replaceAll( "\\W", "" )
Run Code Online (Sandbox Code Playgroud)
为我返回"TESTTEST".我想要的只是删除所有真正的非"单词字符".没有任何东西的任何方式做到这一点
"[^A-Za-z0-9äöüÄÖÜßéèáàúùóò]"
Run Code Online (Sandbox Code Playgroud)
只是意识到我忘记了ô?
Tim*_*ker 155
使用[^\p{L}\p{Nd}]+- 这匹配既不是字母也不是(十进制)数字的所有(Unicode)字符.
在Java中:
String resultString = subjectString.replaceAll("[^\\p{L}\\p{Nd}]+", "");
Run Code Online (Sandbox Code Playgroud)
编辑:
我改为\p{N},\p{Nd}因为前者也匹配一些数字符号,如¼; 后者没有.在regex101.com上查看.
当我碰到这个帖子时,我试图达到完全相反的目的.我知道它已经很老了,但这仍然是我的解决方案.您可以使用块,请参阅此处.在这种情况下,编译以下代码(使用正确的导入):
> String s = "äêìóblah";
> Pattern p = Pattern.compile("[\\p{InLatin-1Supplement}]+"); // this regex uses a block
> Matcher m = p.matcher(s);
> System.out.println(m.find());
> System.out.println(s.replaceAll(p.pattern(), "#"));
Run Code Online (Sandbox Code Playgroud)
您应该看到以下输出:
真正
#blah
最好,
有时您不想简单地删除字符,只需删除重音符号即可.我提出了以下实用程序类,每当我需要在URL中包含String时,我在Java REST Web项目中使用它:
import java.text.Normalizer;
import java.text.Normalizer.Form;
import org.apache.commons.lang.StringUtils;
/**
* Utility class for String manipulation.
*
* @author Stefan Haberl
*/
public abstract class TextUtils {
private static String[] searchList = { "Ä", "ä", "Ö", "ö", "Ü", "ü", "ß" };
private static String[] replaceList = { "Ae", "ae", "Oe", "oe", "Ue", "ue",
"sz" };
/**
* Normalizes a String by removing all accents to original 127 US-ASCII
* characters. This method handles German umlauts and "sharp-s" correctly
*
* @param s
* The String to normalize
* @return The normalized String
*/
public static String normalize(String s) {
if (s == null)
return null;
String n = null;
n = StringUtils.replaceEachRepeatedly(s, searchList, replaceList);
n = Normalizer.normalize(n, Form.NFD).replaceAll("[^\\p{ASCII}]", "");
return n;
}
/**
* Returns a clean representation of a String which might be used safely
* within an URL. Slugs are a more human friendly form of URL encoding a
* String.
* <p>
* The method first normalizes a String, then converts it to lowercase and
* removes ASCII characters, which might be problematic in URLs:
* <ul>
* <li>all whitespaces
* <li>dots ('.')
* <li>(semi-)colons (';' and ':')
* <li>equals ('=')
* <li>ampersands ('&')
* <li>slashes ('/')
* <li>angle brackets ('<' and '>')
* </ul>
*
* @param s
* The String to slugify
* @return The slugified String
* @see #normalize(String)
*/
public static String slugify(String s) {
if (s == null)
return null;
String n = normalize(s);
n = StringUtils.lowerCase(n);
n = n.replaceAll("[\\s.:;&=<>/]", "");
return n;
}
}
Run Code Online (Sandbox Code Playgroud)
作为德语演讲者,我也包括了对德语变音符号的正确处理 - 该列表应该易于扩展到其他语言.
HTH
编辑:请注意,将返回的String包含在URL中可能不安全.您至少应该对其进行HTML编码以防止XSS攻击.
| 归档时间: |
|
| 查看次数: |
97094 次 |
| 最近记录: |