用英语等价物替换德语字符(变音符号,重音符号)

jb.*_*jb. 13 .net c# cultureinfo

用英语等价物替换德语字符(变音符号,重音符号)

我需要从各个文本字段中删除任何德语特定字符,以便处理到另一个不接受它们有效的系统.

所以我所知道的人物是:

ßäöüÄÖÜ

目前我有一种手动方式来替换它们:

myGermanString.Replace("ä","a").Replace("ö","o").Replace("ü","u").....
Run Code Online (Sandbox Code Playgroud)

但我希望有一种更简单/更有效的方法.因为我每次运行都会在数千个字符串上进行,其中99%不会包含这些字符.

也许是一种涉及某种CultureInfo的方法?

(例如,根据MS,以下返回的字符串是相等的

String.Compare("Straße", "Strasse", StringComparison.CurrentCulture);
Run Code Online (Sandbox Code Playgroud)

所以必须存在某种转换表?)

Bar*_*aye 27

该过程称为删除"变音符号" - 请参阅使用以下代码从字符串中删除变音符号(重音符号):

public static String RemoveDiacritics(String s)
{
  String normalizedString = s.Normalize(NormalizationForm.FormD);
  StringBuilder stringBuilder = new StringBuilder();

  for (int i = 0; i < normalizedString.Length; i++)
  {
    Char c = normalizedString[i];
    if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
      stringBuilder.Append(c);
  }

  return stringBuilder.ToString();
}
Run Code Online (Sandbox Code Playgroud)

  • 这不起作用的是'ß'字符 - 它只是按原样返回. (5认同)
  • 你能在这里总结一下这篇文章吗?它有助于将信息保存在一个地方,并有助于防止链接腐烂. (2认同)

Luk*_*lin 10

受到 @Barry Kaye\'s 答案的启发,我稍微扩展了该函数(并将其设为字符串扩展。\n这样做的原因是我们需要将德语变音符号转换为 ascii 字符的组合,例如\xc3\xa4 = ae

\n

它仍然使用字符串生成器,所以它应该足够快。

\n

你可以这样称呼它myStringVariable.RemoveDiacritics();

\n
using System.Collections.Generic;\nusing System.Globalization;\nusing System.Text;\nusing System.Text.RegularExpressions;\n\nnamespace Core.Extensions\n{\n    public static class StringExtensions\n    {\n        public static IReadOnlyDictionary<string, string> SPECIAL_DIACRITICS = new Dictionary<string, string>\n                                                                   {\n                                                                        { "\xc3\xa4".Normalize(NormalizationForm.FormD), "ae".Normalize(NormalizationForm.FormD) },\n                                                                        { "\xc3\x84".Normalize(NormalizationForm.FormD), "Ae".Normalize(NormalizationForm.FormD) },\n                                                                        { "\xc3\xb6".Normalize(NormalizationForm.FormD), "oe".Normalize(NormalizationForm.FormD) },\n                                                                        { "\xc3\x96".Normalize(NormalizationForm.FormD), "Oe".Normalize(NormalizationForm.FormD) },\n                                                                        { "\xc3\xbc".Normalize(NormalizationForm.FormD), "ue".Normalize(NormalizationForm.FormD) },\n                                                                        { "\xc3\x9c".Normalize(NormalizationForm.FormD), "Ue".Normalize(NormalizationForm.FormD) },\n                                                                        { "\xc3\x9f".Normalize(NormalizationForm.FormD), "ss".Normalize(NormalizationForm.FormD) },\n                                                                   };\n\n        public static string RemoveDiacritics(this string s)\n        {\n            if (string.IsNullOrWhiteSpace(s))\n                return s;\n\n            var stringBuilder = new StringBuilder(s.Normalize(NormalizationForm.FormD));\n\n            // Replace certain special chars with special combinations of ascii chars (eg. german umlauts and german double s)\n            foreach (KeyValuePair<string, string> keyValuePair in SPECIAL_DIACRITICS)\n                stringBuilder.Replace(keyValuePair.Key, keyValuePair.Value);\n\n            // Remove other diacritic chars eg. non spacing marks https://www.compart.com/en/unicode/category/Mn\n            for (int i = 0; i < stringBuilder.Length; i++)\n            {\n                char c = stringBuilder[i];\n\n                if (CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.NonSpacingMark)\n                    stringBuilder.Remove(i, 1);\n            }\n\n            return stringBuilder.ToString();\n        }\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n


Joe*_*Joe 9

@ Barry的答案很好,如果你想删除变音符号.

但在德语中通常会替换ü=> ue,ö=> oe等.

这是一个类似问题的链接.