And*_*Dog 3 c# ascii character-encoding
我刚刚偶然发现了另一个问题,其中某人建议使用new ASCIIEncoding().GetBytes(someString)从字符串转换为字节.对我来说很明显,它不适用于非ASCII字符.但事实证明,ASCIIEncoding很乐意用'?'替换无效字符.我对此非常困惑,因为这种打破了最不惊讶的规则.在Python中,u"some unicode string".encode("ascii")默认情况下转换是严格的,因此非ASCII字符会导致此示例中的异常.
两个问题:
.Net提供了在编码转换失败时抛出异常的选项.您需要使用EncoderExceptionFallback类(如果输入字符无法转换为编码的输出字节序列,则抛出EncoderFallbackException)以创建编码.以下代码来自该类的文档:
Encoding ae = Encoding.GetEncoding(
"us-ascii",
new EncoderExceptionFallback(),
new DecoderExceptionFallback());
Run Code Online (Sandbox Code Playgroud)
然后使用该编码执行转换:
// The input string consists of the Unicode characters LEFT POINTING
// DOUBLE ANGLE QUOTATION MARK (U+00AB), 'X' (U+0058), and RIGHT POINTING
// DOUBLE ANGLE QUOTATION MARK (U+00BB).
// The encoding can only encode characters in the US-ASCII range of U+0000
// through U+007F. Consequently, the characters bracketing the 'X' character
// cause an exception.
string inputString = "\u00abX\u00bb";
byte[] encodedBytes = new byte[ae.GetMaxByteCount(inputString.Length)];
int numberOfEncodedBytes = 0;
try
{
numberOfEncodedBytes = ae.GetBytes(inputString, 0, inputString.Length,
encodedBytes, 0);
}
catch (EncoderFallbackException e)
{
Console.WriteLine("bad conversion");
}
Run Code Online (Sandbox Code Playgroud)
此MSDN页面".NET Framework中的字符编码"在某种程度上讨论了默认转换行为背后的基本原理.总之,他们不想干扰依赖于此行为的遗留应用程序.但他们建议覆盖默认值.