Linux 和 Windows .txt 文件有什么区别（Unicode 编码）

Question

Linux 和 Windows .txt 文件有什么区别（Unicode 编码）

19 windows linux ascii

我只使用原始 ANSI 标准中定义的 128 个字符集。

但总的来说，这些文件的实现方式有何不同。

我不关心显示，即如果一个选项卡显示为 6 或 8 个字符，但内存中的实际内部表示

我听说的一个区别是 \r\n (Windows) 与 \n 用于行终止（Linux）。

Answer 1

Ign*_*ams 20

Windows 上的“Unicode”是 UTF-16LE，每个字符为 2 或 4 个字节。Linux 使用 UTF-8，每个字符在 1 到 4 个字节之间。

“每个软件开发人员绝对、肯定必须了解 Unicode 和字符集的绝对最低要求（没有任何借口！）”

Answer 2

use*_*971 15

换行

Windows 使用 CRLF ( \r\n, 0D 0A) 行尾，而 Unix 只使用 LF ( \n, 0A)。

字符编码

Most modern (i.e., since 2004 or so) Unix-like systems make UTF-8 the default character encoding.

Windows, however, lacks native support for UTF-8. It internally works in UTF-16, and assumes that char-based strings are in a legacy code page. Fortunately, Notepad is capable of reading UTF-8 files; unfortunately, "ANSI" encoding is still the default.

Problematic Special Characters

U+001A SUBSTITUTE

Windows (rarely) uses Ctrl+Z as an end-of-file character. For example, if you type a file at the command prompt, it will be truncated at the first 1A byte.

On Unix, Ctrl+Z is nothing special.

U+FEFF ZERO WITH NO-BREAK SPACE (Byte-Order Mark)

在 Windows 上，UTF-8 文件通常以“字节顺序标记”开头，EF BB BF以区别于 ANSI 文件。

在 Linux 上，不鼓励使用 BOM，因为它会破坏 shell 脚本中的 shebang 行等内容。另外，当 UTF-8 是默认编码时，拥有 UTF-8 签名毫无意义。

值得一提的是，伪术语“ANSI代码页”虽然仍然出现在记事本等程序中，但完全是用词不当，微软很早就承认了这一点。有关详细信息，请参阅 http://en.wikipedia.org/wiki/Windows_code_page。 (3认同)

归档时间：	14 年，5 月前
查看次数：	72017 次
最近记录：	7 年，3 月前