Char为29个unicode charachters返回错误的值 - 需要.NET转换/将nchar转换为char

pap*_*zzo 1 .net t-sql sql-server unicode

需要将SQL nchar的.NET转换/转换为char.
更具体地说,将nchar UNICODE强制转换为char ASCII.

这很复杂的是SQL char使用完整的字节.
不是128的纯ASCII
.TSQL函数ASCII返回0-255.

理想情况下,FormByte会有一个NormalizationForm.
它不是一个确切的文本值 - 而是一个接近的逻辑值或?.
SQL会使用FormByte从nchar转换为char.
NormalizationForm

编码解码对我来说不起作用,我尝试了各种口味.

在SQL中,许多char(字节)映射到63. 63是?
不只是255以上的char映射到63.
130到140都映射到63.

角色160-255全部返回160-255

超过255并不是全部都映射到63.
例如,许多变音符号被映射到ASCII.

TSQL具有UNICODE和ACSII功能.
所以我只是将所有Unicode字符加载到char和nchar列中.

SQL返回的char对于29个字符是错误的.
并且为坏字符返回的ASCII()没有意义 - 所有控制字符在130-160范围内.
检查二进制文件中是否存在错误的29,存储的内容是ASCII()返回的内容.
对于27,从char返回的是nchar和2,它甚至不是正确的nchar.它们都应映射到?或ACSII等价物.
"和"映射到"(但将采取?)"和"映射到"
- (简称)和 - (em破折号)映射到 -
...

我知道你不相信我.
将'Œ'插入char列并选择它 - 它将返回'Œ'.
你甚至可以搜索它 - char ='Œ'返回true.
选择ASCII('Œ')返回140,这是实际存储的内容(检查二进制).
UNICODE对140/8C的定义是偏序线.
我检查了该char的二进制值,它是8C(140).
返回的是unicode'Œ'Int16338.
看起来SQL正在做一些输入输出映射并弄错了.

ASCII函数对于未映射到?的575个unicode字符是否正确.
char值与ACSII匹配,它们都有意义.
EG 12种不同形式的u都映射到u.
除了32163个字符?被映射到?(63).

下面是返回错误值的29个字符.
列顺序:
char
nchar
ASCII(char)
UNICODE(nchar)

     sqlCharASCIIbackToString did not match  Œ Œ 140 338
     sqlCharASCIIbackToString did not match  œ œ 156 339
     sqlCharASCIIbackToString did not match  Š Š 138 352
     sqlCharASCIIbackToString did not match  š š 154 353
     sqlCharASCIIbackToString did not match  Ÿ Ÿ 159 376
     sqlCharASCIIbackToString did not match  Ž Ž 142 381
     sqlCharASCIIbackToString did not match  ž ž 158 382
     sqlCharASCIIbackToString did not match  ƒ ? 131 401
     sqlCharASCIIbackToString did not match  ƒ ƒ 131 402
     sqlCharASCIIbackToString did not match  ˆ ˆ 136 710
     sqlCharASCIIbackToString did not match  ˜ ˜ 152 732
     sqlCharASCIIbackToString did not match  – – 150 8211
     sqlCharASCIIbackToString did not match  — — 151 8212
     sqlCharASCIIbackToString did not match  ‘ ‘ 145 8216
     sqlCharASCIIbackToString did not match  ’ ’ 146 8217
     sqlCharASCIIbackToString did not match  ‚ ‚ 130 8218
     sqlCharASCIIbackToString did not match  “ “ 147 8220
     sqlCharASCIIbackToString did not match  ” ” 148 8221
     sqlCharASCIIbackToString did not match  „ „ 132 8222
     sqlCharASCIIbackToString did not match  † † 134 8224
     sqlCharASCIIbackToString did not match  ‡ ‡ 135 8225
     sqlCharASCIIbackToString did not match  • • 149 8226
     sqlCharASCIIbackToString did not match 
     … … 133 8230
     sqlCharASCIIbackToString did not match  ‰ ‰ 137 8240
     sqlCharASCIIbackToString did not match  ‹ ‹ 139 8249
     sqlCharASCIIbackToString did not match  › › 155 8250
     sqlCharASCIIbackToString did not match  € € 128 8364
     sqlCharASCIIbackToString did not match  ™ ™ 153 8482
     sqlCharASCIIbackToString did not match  ˜ ? 152 8776
     count63 =  32163 countMis =  29 countCorrect =  575
Run Code Online (Sandbox Code Playgroud)

如果SQL返回,请运行以下.NET以查看"Œ"

char char338 = (char)338;
System.Diagnostics.Debug.WriteLine(char338);
sqlCmd.CommandText = "select [char] from [charNchar] where [char] = @char;";
sqlCmd.Parameters.Add("@char", SqlDbType.Char).Value = char338;
string string338= sqlCmd.ExecuteScalar().ToString();
char338 = string338.ToCharArray()[0];
System.Diagnostics.Debug.WriteLine(char338 + " " + ((Int16)char338).ToString());
Run Code Online (Sandbox Code Playgroud)

上面的代码返回Œ338.SQL
返回一个大于byte的值,数据类型应该存储为byte.
如果我搜索(char)140那么?63返回.

有趣的是搜索'Œ'与N'Œ'对char产生不同的结果.
那是在左边搜索(140)Œ.
在右边搜索(338)Œ字符搜索什么都没找到.
Nchar通过任一输入找到两个结果.

  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [char] = 'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [char] = N'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [nchar] = 'Œ'
  SELECT [int16],RTRIM([char]) as [char], ASCII([char]) as 'ASCII'
                ,RTRIM([nchar]) as [nchar], UNICODE([nchar]) as 'UNICODE'
  FROM [test].[dbo].[charNchar]
  where [nchar] = N'Œ'


int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Œ                                                  140         Œ                                                  338

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Œ                                                  140         Œ                                                  338
339    œ                                                  156         œ                                                  339

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Œ                                                  140         Œ                                                  338
339    œ                                                  156         œ                                                  339

int16  char                                               ASCII       nchar                                              UNICODE
------ -------------------------------------------------- ----------- -------------------------------------------------- -----------
338    Œ                                                  140         Œ                                                  338
339    œ                                                  156         œ                                                  339
Run Code Online (Sandbox Code Playgroud)

≈搜索找不到任何四个查询.检查图表,这是8776的正确字符,数学几乎等于. 

〜是粘贴到SSMS中的零宽度,但它就像是粘贴到FROM从蓝色转到黑色. 

我错过了什么 - 这对我来说似乎是个错误.
它不仅仅是错误的价值,它是一个无效的价值.
返回Int16.
让我们说我想使用字节来存储字符以节省空间 - 它会破坏SQL char,因为29个字符不会返回为字节.

这是我使用的代码:

public void SQLchar()
{

    SqlConnection sqlCon = new SqlConnection(connString);  
    try
    {         
        sqlCon.Open();
        SqlCommand sqlCmd = sqlCon.CreateCommand();
        SqlDataReader rdr;
        sqlCmd.CommandText = "delete charNchar";
        sqlCmd.ExecuteNonQuery();
        for(Int16 i = 0; i < Int16.MaxValue; i ++)
        {
            sqlCmd.CommandText = "insert into charNchar (int16,char,nchar) values (@int16, @char, @nchar);";
            sqlCmd.CommandType = System.Data.CommandType.Text;
            sqlCmd.Parameters.Clear();
            sqlCmd.Parameters.Add("@int16", SqlDbType.Int).Value = i;
            sqlCmd.Parameters.Add("@char", SqlDbType.Char).Value = (char)i;
            sqlCmd.Parameters.Add("@nchar", SqlDbType.NChar).Value = (char)i;
            sqlCmd.ExecuteNonQuery();
        }
        string sqlChar;
        string sqlNChar;
        Int16 sqlCharASCII;
        Int16 sqlNCharUnicode;
        string sqlCharASCIIbackToString;
        sqlCmd.CommandText = "select char,nchar,ASCII(char),UNICODE(nchar) from charNchar order by int16;";
        rdr = sqlCmd.ExecuteReader();
        Int16 count63 = 0;
        Int16 countMis = 0;
        Int16 countCorrect = 0;
        while (rdr.Read())
        {
            sqlChar = rdr.IsDBNull(0) ? "dbNull" : rdr.GetString(0);
            sqlNChar = rdr.IsDBNull(1) ? "dbNull" : rdr.GetString(1);
            sqlCharASCII = rdr.IsDBNull(2) ? Int16.Parse("-1") : (Int16)rdr.GetInt32(2);
            sqlNCharUnicode = rdr.IsDBNull(3) ? Int16.Parse("-1") : (Int16)rdr.GetInt32(3);
            if(sqlCharASCII == 63 && sqlNCharUnicode != 63)
            {
                count63 ++;
                continue;  // ?
            }
            if (sqlCharASCII < 0)
            {
                System.Diagnostics.Debug.WriteLine("ASCII(char) null for " + sqlChar + " " + sqlNChar);
            }
            else
            {
                sqlCharASCIIbackToString = ((char)sqlCharASCII).ToString();
                if (string.CompareOrdinal(sqlChar, sqlCharASCIIbackToString) != 0)
                {
                    countMis++;
                    System.Diagnostics.Debug.WriteLine(" sqlCharASCIIbackToString did not match " + sqlCharASCIIbackToString + " " + sqlChar + " " + sqlNChar + " " + sqlCharASCII + " " + sqlNCharUnicode);
                }
                else
                {
                    countCorrect++;
                }
            }
        }
        rdr.Close();
        System.Diagnostics.Debug.WriteLine("count63 =  " + count63.ToString() + " countMis =  " + countMis.ToString() + " countCorrect =  " + countCorrect.ToString());
    }
    catch (Exception Ex)
    {
        System.Diagnostics.Debug.WriteLine(Ex.Message);
    }
    finally 
    {
        sqlCon.Close();
    }
}
Run Code Online (Sandbox Code Playgroud)

至于为什么.
在.NET中解析字符串数据,该数据是FK.
而不是往返SQL获取FK的ID使用.NET字典来提高速度.
Dictionary是一个反向查找,用于从值中获取键.
解析器具有char的Int16,因为解析器已经使用了它.
因此,如果char的ASCII错误,则反向查找失败.
我想我可以为不正确的ASCII结果进行硬编码修复.
但是,在我沿着以补丁开始的路径前,我想了解这里发生了什么.
char有一些根本缺陷吗?
可以只使用nchar,但我们更喜欢char.
应用程序的性质是我们想要匹配.
所有匹配的6个变音​​符号ascii u是一件好事.

Esa*_*ija 10

您大量混淆代码点值和编码的字节值.

代码点U + 0152(338或Œ)在Windows-1252中编码为字节0x8C或十进制140,这就是命名错误的ASCII()函数返回的内容.巧合的是,Windows-1252中的许多代码点的编码方式使得编码的代码点具有与该代码点的编码字节值相同的值.

Windows-1252只能编码:

0-127
160-255
Run Code Online (Sandbox Code Playgroud)

而这些并不是在一个范围内整齐地:

338,339,352,353,376,381,382,402,
710,732,8211,8212,8216,8217,8218,
8220,8221,8222,8224,8225,8226,
8230,8240,8249,8250,8364,8482
Run Code Online (Sandbox Code Playgroud)

第二批中的所有代码点都不会以字节值< - >代码点值编码,这正是您所期望的.

Windows-1252无法对128-159范围进行编码,因此尝试转换该范围内的任何内容(例如130或140)只会被编码为?0x3F.无论如何,这个范围几乎是无用的C1控制角色.

它也没有使用它拥有的完整256个字符空间,它只编码251个不同的字符.所以你不能将它用作伪字节,因为5个字节是无效的Windows-1252.如果这是你试图做的,它将无法工作.


实际上并不清楚你甚至试图做什么高级别的事情我会猜测.

如果您想要对重音不敏感,那么只需使用不区分重音的排序规则.然后ü,ú,ù等将全部的比赛u.与编码无关.

CREATE TABLE Mytable (
    Mycolumn NVARCHAR(10) COLLATE Latin1_General_CI_AI
)

INSERT INTO Mytable (myColumn) VALUES( 'ü' ), ('ú'), ( 'ù' )

SELECT Mycolumn
FROM Mytable
WHERE Mycolumn = 'u'

--Results

MYCOLUMN
ü
ú
ù
Run Code Online (Sandbox Code Playgroud)

这是一个演示http://sqlfiddle.com/#!3/67752/2.


要将SQLAscii转换为'Œ',请尝试以下方法:

public static char Windows1252CPtoChar(int cp)
{
    Encoding win1252 = Encoding.GetEncoding("Windows-1252"); //this could be made static
    return win1252.GetString(new byte[] { (byte)cp })[0];
}

public static void Main(string[] args) {
    Console.WriteLine(Windows1252CPtoChar(140) == 'Œ');
}
Run Code Online (Sandbox Code Playgroud)

所以代替:

sqlCharASCIIbackToString = ((char)sqlCharASCII).ToString();
Run Code Online (Sandbox Code Playgroud)

sqlCharASCIIbackToString = (Windows1252CPtoChar(sqlCharASCII)).ToString();
Run Code Online (Sandbox Code Playgroud)