将Unicode Unicode代码点保存为UTF-8文件

Question

将Unicode Unicode代码点保存为UTF-8文件

Lar*_*rry 3 c io utf-8

上下文

Debian 64bits尝试将int(例如233)写入文件并使其文本打印为"é".

题

我无法理解如何编写一个utf8等效字符,例如"é"或任何比char类型更宽的UTF-8字符.该文件应该是人类可读的,以通过网络发送.

我的目标是将int写入文件并获得其utf8等效项.

我不知道我在做什么.

码

FILE * dd = fopen("/myfile.txt","w");
fprintf(dd, "%s", 233); /* The file should print "é" */
fclose(dd);

Run Code Online (Sandbox Code Playgroud)

谢谢

更新:

根据Biffen的评论,这里是写"E9"("é"的十六进制值)的另一个代码片段;

int p = 233;
char r[5];
sprintf(r,"%x",p);
printf("%s\n",r);
fwrite(r,1,strlen(r),dd);
fclose(dd);

Run Code Online (Sandbox Code Playgroud)

如何将其转换为"é"？

更新最终工作代码:

UFILE * dd = u_fopen("/myfile.txt","wb", NULL, NULL);
UChar32 c = 233;
u_fputc(c,dd);
u_fclose(dd);

Run Code Online (Sandbox Code Playgroud)

Answer 1

unw*_*ind 5

您似乎希望printf()了解UTF-8,但事实并非如此.

你可以自己实现UTF-8编码,毕竟这是一个非常简单的编码.

解决方案可能如下所示:

void put_utf8(FILE *f, uint32_t codepoint)
{
    if (codepoint <= 0x7f) {
       fprintf(f, "%c", (char) codepoint & 0x7f);
    }
    else if (codepoint <= 0x7ff) {
       fprintf(f, "%c%c", (char) (0xc0 | (codepoint >> 6)),
                          (char) (0x80 | (codepoint & 0x3f));
    }
    else if (codepoint <= 0xffff) {
       fprintf(f, "%c%c%c", (char) (0xe0 | (codepoint >> 12)),
                            (char) (0x80 | ((codepoint >> 6) & 0x3f),
                            (char) (0x80 | (codepoint & 0x3f));
    }
    else if (codepoint <= 0x1fffff) {
       fprintf(f, "%c%c%c%c", (char) (0xf0 | (codepoint >> 18)),
                              (char) (0x80 | ((codepoint >> 12) & 0x3f),
                              (char) (0x80 | ((codepoint >> 6) & 0x3f),
                              (char) (0x80 | (codepoint & 0x3f));
    }
    else {
        // invalid codepoint
    }
}

Run Code Online (Sandbox Code Playgroud)

你会这样使用它:

FILE *f = fopen("mytext.txt", "wb");
put_utf8(f, 233);
fclose(f);

Run Code Online (Sandbox Code Playgroud)

然后它将输出两个字符0xC3和0xA9 f.

有关UTF-8的更多详细信息,请参阅Wikipedia.

归档时间：	11 年，2 月前
查看次数：	629 次
最近记录：	11 年，2 月前