jb.*_*jb. 13 .net c# cultureinfo
用英语等价物替换德语字符(变音符号,重音符号)
我需要从各个文本字段中删除任何德语特定字符,以便处理到另一个不接受它们有效的系统.
所以我所知道的人物是:
ßäöüÄÖÜ
目前我有一种手动方式来替换它们:
myGermanString.Replace("ä","a").Replace("ö","o").Replace("ü","u").....
Run Code Online (Sandbox Code Playgroud)
但我希望有一种更简单/更有效的方法.因为我每次运行都会在数千个字符串上进行,其中99%不会包含这些字符.
也许是一种涉及某种CultureInfo的方法?
(例如,根据MS,以下返回的字符串是相等的
String.Compare("Straße", "Strasse", StringComparison.CurrentCulture);
Run Code Online (Sandbox Code Playgroud)
所以必须存在某种转换表?)
Bar*_*aye 27
该过程称为删除"变音符号" - 请参阅使用以下代码从字符串中删除变音符号(重音符号):
public static String RemoveDiacritics(String s)
{
String normalizedString = s.Normalize(NormalizationForm.FormD);
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < normalizedString.Length; i++)
{
Char c = normalizedString[i];
if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
Run Code Online (Sandbox Code Playgroud)
Luk*_*lin 10
受到 @Barry Kaye\'s 答案的启发,我稍微扩展了该函数(并将其设为字符串扩展。\n这样做的原因是我们需要将德语变音符号转换为 ascii 字符的组合,例如\xc3\xa4 = ae
。
它仍然使用字符串生成器,所以它应该足够快。
\n你可以这样称呼它myStringVariable.RemoveDiacritics();
using System.Collections.Generic;\nusing System.Globalization;\nusing System.Text;\nusing System.Text.RegularExpressions;\n\nnamespace Core.Extensions\n{\n public static class StringExtensions\n {\n public static IReadOnlyDictionary<string, string> SPECIAL_DIACRITICS = new Dictionary<string, string>\n {\n { "\xc3\xa4".Normalize(NormalizationForm.FormD), "ae".Normalize(NormalizationForm.FormD) },\n { "\xc3\x84".Normalize(NormalizationForm.FormD), "Ae".Normalize(NormalizationForm.FormD) },\n { "\xc3\xb6".Normalize(NormalizationForm.FormD), "oe".Normalize(NormalizationForm.FormD) },\n { "\xc3\x96".Normalize(NormalizationForm.FormD), "Oe".Normalize(NormalizationForm.FormD) },\n { "\xc3\xbc".Normalize(NormalizationForm.FormD), "ue".Normalize(NormalizationForm.FormD) },\n { "\xc3\x9c".Normalize(NormalizationForm.FormD), "Ue".Normalize(NormalizationForm.FormD) },\n { "\xc3\x9f".Normalize(NormalizationForm.FormD), "ss".Normalize(NormalizationForm.FormD) },\n };\n\n public static string RemoveDiacritics(this string s)\n {\n if (string.IsNullOrWhiteSpace(s))\n return s;\n\n var stringBuilder = new StringBuilder(s.Normalize(NormalizationForm.FormD));\n\n // Replace certain special chars with special combinations of ascii chars (eg. german umlauts and german double s)\n foreach (KeyValuePair<string, string> keyValuePair in SPECIAL_DIACRITICS)\n stringBuilder.Replace(keyValuePair.Key, keyValuePair.Value);\n\n // Remove other diacritic chars eg. non spacing marks https://www.compart.com/en/unicode/category/Mn\n for (int i = 0; i < stringBuilder.Length; i++)\n {\n char c = stringBuilder[i];\n\n if (CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.NonSpacingMark)\n stringBuilder.Remove(i, 1);\n }\n\n return stringBuilder.ToString();\n }\n }\n}\n
Run Code Online (Sandbox Code Playgroud)\n
归档时间: |
|
查看次数: |
31099 次 |
最近记录: |