将wchar打印到Linux控制台?

dav*_*avy 8 c linux console wchar-t wchar

我的C程序粘贴在下面.在bash中,程序打印"char is",Ω不打印.我的语言环境都是en_US.utf8.

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

int main() {
   int r;
   wchar_t myChar1 = L'?';
   r = wprintf(L"char is %c\n", myChar1);
}
Run Code Online (Sandbox Code Playgroud)

vst*_*stm 14

这非常有趣.显然,编译器将omega从UTF-8转换为UNICODE,但不知何故libc将其搞砸了.

首先:%c-format说明符需要a char(甚至在wprintf -version中),因此你必须指定%lc(因此%ls对于字符串).

其次,如果您运行代码,则将语言环境设置为C(它不会自动从环境中获取).您必须setlocale使用空字符串调用以从环境中获取区域设置,因此libc再次开心.

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>

int main() {
    int r;
    wchar_t myChar1 = L'?';
    setlocale(LC_CTYPE, "");
    r = wprintf(L"char is %lc (%x)\n", myChar1, myChar1);
}
Run Code Online (Sandbox Code Playgroud)

  • 实际上,这是预期的.libc不会搞砸,它只是遵循标准. (3认同)

Rus*_*hPL 6

对于建议修复LIBC的答案,您可以这样做:

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

// NOTE: *NOT* thread safe, not re-entrant
const char* unicode_to_utf8(wchar_t c)
{
    static unsigned char b_static[5];
    unsigned char* b = b_static; 

    if (c<(1<<7))// 7 bit Unicode encoded as plain ascii
    {
        *b++ = (unsigned char)(c);
    }
    else if (c<(1<<11))// 11 bit Unicode encoded in 2 UTF-8 bytes
    {
        *b++ = (unsigned char)((c>>6)|0xC0);
        *b++ = (unsigned char)((c&0x3F)|0x80);
    }
    else if (c<(1<<16))// 16 bit Unicode encoded in 3 UTF-8 bytes
        {
        *b++ = (unsigned char)(((c>>12))|0xE0);
        *b++ =  (unsigned char)(((c>>6)&0x3F)|0x80);
        *b++ =  (unsigned char)((c&0x3F)|0x80);
    }

    else if (c<(1<<21))// 21 bit Unicode encoded in 4 UTF-8 bytes
    {
        *b++ = (unsigned char)(((c>>18))|0xF0);
        *b++ = (unsigned char)(((c>>12)&0x3F)|0x80);
        *b++ = (unsigned char)(((c>>6)&0x3F)|0x80);
        *b++ = (unsigned char)((c&0x3F)|0x80);
    }
    *b = '\0';
    return b_static;
}


int main() {
    int r;
    wchar_t myChar1 = L'?';
    r = printf("char is %s\n", unicode_to_utf8(myChar1));
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

  • 这个答案很愚蠢;开始使用 `wchar_t` 的唯一一点是理论上你可以在不同的语言环境中支持不同的输出编码。如果你想对 UTF-8 进行硬编码,只需使用 `char *myChar1 = "Ω";` 然后使用 `printf` 和 `%s`... (2认同)