有没有像"用户定义的编码回退"这样的事情

joe*_*joe 11 c# encoding fallback ascii

当使用ASCII编码并将字符串编码为字节时,ö会产生类似的字符?.

Encoding encoding = Encoding.GetEncoding("us-ascii");     // or Encoding encoding = Encoding.ASCI;
data = encoding.GetBytes(s);
Run Code Online (Sandbox Code Playgroud)

我正在寻找一种方法来替换不同的字符,而不仅仅是一个问号.
例子:

ä -> ae
ö -> oe
ü -> ue
ß -> ss
Run Code Online (Sandbox Code Playgroud)

如果不能用一个字符替换一个字符,我会接受,如果我甚至可以用一个字符替换它们(ö- > o)

现在有几个实现EncoderFallback,但我不明白它们是如何工作的.
一个快速而肮脏的解决方案是在给出字符串之前替换所有这些字符Encoding.GetBytes(),但这似乎不是"正确"的方式.
我希望我能给出编码对象的替换表.

我怎么能做到这一点?

Mic*_*eld 9

实现您想要的"最正确"方法是实现自定义回退编码器,以实现最佳匹配回退.由于各种原因,内置于.NET的内容非常保守,它会尝试最适合的字符(存在安全隐患,具体取决于您计划重新编码字符串的用途.)您的自定义回退策略可以根据你想要的任何规则做到最合适.

话虽如此 - 在你的后备课程中,你最终会编写一个包含所有不可编码的Unicode代码点的大型案例陈述,并手动将它们映射到最适合的替代品.您可以通过简单地循环遍历字符串并交换不支持的字符进行替换来实现相同的目标.回退策略的主要好处是性能:您最终只能循环遍历字符串一次,而不是至少两次.但是,除非你的字符串很大,否则我不会太担心它.

如果您确实想要实现自定义回退策略,那么您一定要阅读我的评论中的文章:.NET Framework中的字符编码.这不是很难,但您必须了解编码回退的工作原理.

您为该Encoder.GetEncoding方法提供了自定义类的实现,该实现必须派生自EncoderFallback.不过,这个课程基本上只是一个真正的工作包装,这是完成的EncoderFallbackBuffer.您需要缓冲区的原因是因为后备不一定是一对一的过程; 在您的示例中,您可能最终将单个Unicode字符映射到两个ASCII字符.

在编码过程首次遇到问题并需要回退到您的策略时,它会使用您的EncoderFallback实现来创建您的实例EncoderFallbackBuffer.然后它调用Fallback自定义缓冲区的方法.

在内部,您的缓冲区会构建一组要返回的字符来代替不可编码的字符,然后返回true.从那里,编码器将GetNextChar重复调用,只要Remaining > 0和/或直到GetNextChar返回CP 0,并将这些字符粘贴到编码结果中.

这篇文章包含了你正在尝试做的几乎完全正确的实现; 我已经复制了下面的基本框架,这应该可以帮到你.

public class CustomMapper : EncoderFallback
{
   // Use can override the "replacement character", so track what they
   // give us.
   public string DefaultString;

   public CustomMapper() : this("*")
   {   
   }

   public CustomMapper(string defaultString)
   {
      this.DefaultString = defaultString;
   }

   public override EncoderFallbackBuffer CreateFallbackBuffer()
   {
      return new CustomMapperFallbackBuffer(this);
   }

   // This is the length of the largest possible replacement string we can
   // return for a single Unicode code point.
   public override int MaxCharCount
   {
      get { return 2; }
   } 
}

public class CustomMapperFallbackBuffer : EncoderFallbackBuffer
{
   CustomMapper fb; 

   public CustomMapperFallbackBuffer(CustomMapper fallback)
   {
      // We can use the same custom buffer with different fallbacks, e.g.
      // we might have different sets of replacement characters for different
      // cases. This is just a reference to the parent in case we want it.
      this.fb = fallback;
   }

   public override bool Fallback(char charUnknown, int index)
   {
      // Do the work of figuring out what sequence of characters should replace
      // charUnknown. index is the position in the original string of this character,
      // in case that's relevant.

      // If we end up generating a sequence of replacement characters, return
      // true, and the encoder will start calling GetNextChar. Otherwise return
      // false.

      // Alternatively, instead of returning false, you can simply extract
      // DefaultString from this.fb and return that for failure cases.
   }

   public override bool Fallback(char charUnknownHigh, char charUnknownLow, int index)
   {
      // Same as above, except we have a UTF-16 surrogate pair. Same rules
      // apply: if we can map this pair, return true, otherwise return false.
      // Most likely, you're going to return false here for an ASCII-type
      // encoding.
   }

   public override char GetNextChar()
   {
      // Return the next character in our internal buffer of replacement
      // characters waiting to be put into the encoded byte stream. If
      // we're all out of characters, return '\u0000'.
   }

   public override bool MovePrevious()
   {
      // Back up to the previous character we returned and get ready
      // to return it again. If that's possible, return true; if that's
      // not possible (e.g. we have no previous character) return false;
   }

   public override int Remaining 
   {
      // Return the number of characters that we've got waiting
      // for the encoder to read.
      get { return count < 0 ? 0 : count; }
   }

   public override void Reset()
   {
       // Reset our internal state back to the initial one.
   }
}
Run Code Online (Sandbox Code Playgroud)