如何在 Linux 中检查文件的字符编码

Question

如何在 Linux 中检查文件的字符编码

You*_*ung 6 linux encoding utf-8 character-encoding

我有一些由不同字符编码编码的文本文件，例如ascii, utf-8, big5, gb2312.

现在想知道他们的准确字符编码，用文本编辑器查看，否则会出现乱码。

我在网上搜索，发现file命令可以显示文件的字符编码，例如：

$ file -bi *
text/plain; charset=iso-8859-1
text/plain; charset=us-ascii
text/plain; charset=iso-8859-1
text/plain; charset=utf-8

Run Code Online (Sandbox Code Playgroud)

不幸的是，用big5和编码的文件gb2312都存在charset=iso-8859-1，所以我仍然无法区分。有没有更好的方法来检查文本文件的字符编码？

Answer 1

You*_*ung 7

在某种程度上，@ewcz 的建议有效。

$ uchardet *
big5.txt: BIG5
conf: ASCII
gb2312-windows.txt: GB18030
gb.txt: GB18030
test.java: UTF-8

Run Code Online (Sandbox Code Playgroud)

和

enca -L chinese *
big5.txt: Traditional Chinese Industrial Standard; Big5
conf: 7bit ASCII characters
gb2312-windows.txt: Simplified Chinese National Standard; GB2312
  CRLF line terminators
gb.txt: Simplified Chinese National Standard; GB2312
test.java: Universal transformation format 8 bits; UTF-8

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，9 月前
查看次数：	19826 次
最近记录：	4 年，6 月前