小编non*_*one的帖子

在C中将非Ascii字符转换为int,额外的位由1而不是0补充

当用C编码时,我意外地发现,对于非Ascii字符,在它们从char(1字节)转换为int(4字节)之后,额外的比特(3字节)补充为1而不是0.(至于Ascii字符,额外的位由0补充.)例如:

char c[] = "?";
int i = c[0];
printf("%x\n", i);

Run Code Online (Sandbox Code Playgroud)

结果是ffffffc4,而不是c4自己.(ā的UTF-8代码\xc4\x81.)

另一个相关的问题是,当>>对非Ascii字符执行右移操作时,左端的额外位也补充为1而不是0,即使char变量显式转换为unsigned int(对于signed int也是如此) ,在我的操作系统中,额外的位补充为1).例如:

char c[] = "?";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];

c[0] = (unsigned int)c[0] >> 1; 
u_c = (unsigned int)c[0] >> 1;      
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", …

Run Code Online (Sandbox Code Playgroud)

c string utf-8 type-conversion non-ascii-characters

non*_*one

lucky-day

0
推荐指数

1
解决办法

369
查看次数