Vim的编码选项

Roo*_*ook 17 vim

虽然Vim的帮助是信息的宝藏,但在某些情况下我发现它令人难以置信.它对不同编码相关选项的解释就是这种情况.

有人可以用简单的语言向我解释编码,
文件编码和文件编码设置的作用,以及如何a)查看当前文件的编码?
b)更改当前文件的编码?
c)做一些经常使用的其他东西,但现在却滑倒了我的脑海?

Ben*_*oit 29

  • encoding Vim使用它来了解它支持的字符集以及字符在内部的存储方式.

    你不应该真的修改这个设置 ; 它应默认为Unicodeish.否则,您无法读取和写入具有扩展字符集的文件.如果您不确定,请
    放在:set encoding=utf-8您的开头vimrc,并且永远不再使用该设置,除非您必须使用1字节编码读取一个会话的大文件.

  • fileencoding存储当前缓冲区的编码.
    您可以读取和写入此变量,它将执行您想要的操作.
    修改它时,该文件将被标记为已修改,当您将其保存(:w:up)到磁盘时,它将使用您指定的编码进行编写.

  • fileencodings告诉Vim如何检测您读取的每个文件的编码(以确定其值fileencoding).它是按顺序尝试的编码列表,并且假定与文件的二进制内容一致的第一个编码是您正在读取的文件的编码.
    设置一次,然后忘记它.如果您知道要打开大量文件并且它们都使用相同的编码,您可能需要更改它,并且您不想浪费时间尝试检查其他编码.默认情况下,ucs-bom,utf8,latin1如果您在西欧,这是很好的IMO,因为几乎所有文件都将以正确的编码打开.但是,使用此设置时,当您打开纯ASCII文件(即,UTF8和任何基于拉丁语的代码页编码中的哪个字节表示相同)时,将假定该文件为UTF8,并保存为原始文件.
    示例:如果设置fileencodingslatin1,utf8,则打开的每个文件都将被读取,latin1因为尝试读取带有latin1编码的文件永远不会失败:256个可能的字节值与字符集中的各个字符之间存在双射.
    相反,如果您尝试fileencodings=ucs-bom,utf8,latin1Vim将首先检查字节顺序标记并使用BOM解码Unicode文件,然后如果失败(无BOM)尝试以UTF-8读取您的文件,如果它失败(因为某些字节序列在UTF8无效)打开您的文件latin1.

  • 为了重新装入正确的编码(情况下,当一个文件fileencodings没有正常工作),你可以这样做::e! ++enc=<the_encoding>.

TL;博士:

  1. 查看当前文件的编码:( :echo &fileencoding更短::echo &fenc:set fenc?:verb set fenc?)
  2. 更改当前文件的编码::set fenc=…...然后:w根据需要多次调用.
  3. 使用适当的编码重新加载您的文件: :e! ++enc=…

  • @Benoit:或者一步到位:`:w ++ enc = utf8`. (2认同)

mee*_*ern 6

encoding:
内部表示.查看或设置:

:set encoding
:set encoding = utf-8
Run Code Online (Sandbox Code Playgroud)

fileencoding:

写入文件时将使用的表示形式.查看或设置:

:set fileencoding
:set fileencoding = utf-8
Run Code Online (Sandbox Code Playgroud)

fileencodings:

读取文件时测试的可能编码列表.查看或设置:

:set fileencodings
:set fileencodings= utf-8,latin-1,cp1251
Run Code Online (Sandbox Code Playgroud)

以下是vim文档中可能编码的列表(mbyte-encoding)

Supported 'encoding' values are:            *encoding-values*
1   latin1  8-bit characters (ISO 8859-1, also used for cp1252)
1   iso-8859-n  ISO_8859 variant (n = 2 to 15)
1   koi8-r  Russian
1   koi8-u  Ukrainian
1   macroman    MacRoman (Macintosh encoding)
1   8bit-{name} any 8-bit encoding (Vim specific name)
1   cp437   similar to iso-8859-1
1   cp737   similar to iso-8859-7
1   cp775   Baltic
1   cp850   similar to iso-8859-4
1   cp852   similar to iso-8859-1
1   cp855   similar to iso-8859-2
1   cp857   similar to iso-8859-5
1   cp860   similar to iso-8859-9
1   cp861   similar to iso-8859-1
1   cp862   similar to iso-8859-1
1   cp863   similar to iso-8859-8
1   cp865   similar to iso-8859-1
1   cp866   similar to iso-8859-5
1   cp869   similar to iso-8859-7
1   cp874   Thai
1   cp1250  Czech, Polish, etc.
1   cp1251  Cyrillic
1   cp1253  Greek
1   cp1254  Turkish
1   cp1255  Hebrew
1   cp1256  Arabic
1   cp1257  Baltic
1   cp1258  Vietnamese
1   cp{number}  MS-Windows: any installed single-byte codepage
2   cp932   Japanese (Windows only)
2   euc-jp  Japanese (Unix only)
2   sjis    Japanese (Unix only)
2   cp949   Korean (Unix and Windows)
2   euc-kr  Korean (Unix only)
2   cp936   simplified Chinese (Windows only)
2   euc-cn  simplified Chinese (Unix only)
2   cp950   traditional Chinese (on Unix alias for big5)
2   big5    traditional Chinese (on Windows alias for cp950)
2   euc-tw  traditional Chinese (Unix only)
2   2byte-{name} Unix: any double-byte encoding (Vim specific name)
2   cp{number}  MS-Windows: any installed double-byte codepage
u   utf-8   32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1)
u   ucs-2   16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1)
u   ucs-2le like ucs-2, little endian
u   utf-16  ucs-2 extended with double-words for more characters
u   utf-16le    like utf-16, little endian
u   ucs-4   32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1)
u   ucs-4le like ucs-4, little endian

The {name} can be any encoding name that your system supports.  It is passed
to iconv() to convert between the encoding of the file and the current locale.
For MS-Windows "cp{number}" means using codepage {number}.
Examples:
    :set encoding=8bit-cp1252
    :set encoding=2byte-cp932

The MS-Windows codepage 1252 is very similar to latin1.  For practical reasons
the same encoding is used and it's called latin1.  'isprint' can be used to
display the characters 0x80 - 0xA0 or not.

Several aliases can be used, they are translated to one of the names above.
An incomplete list:

1   ansi    same as latin1 (obsolete, for backward compatibility)
2   japan   Japanese: on Unix "euc-jp", on MS-Windows cp932
2   korea   Korean: on Unix "euc-kr", on MS-Windows cp949
2   prc     simplified Chinese: on Unix "euc-cn", on MS-Windows cp936
2   chinese     same as "prc"
2   taiwan  traditional Chinese: on Unix "euc-tw", on MS-Windows cp950
u   utf8    same as utf-8
u   unicode same as ucs-2
u   ucs2be  same as ucs-2 (big endian)
u   ucs-2be same as ucs-2 (big endian)
u   ucs-4be same as ucs-4 (big endian)
u   utf-32  same as ucs-4
u   utf-32le    same as ucs-4le
    default     stands for the default value of 'encoding', depends on the
    environment

For the UCS codes the byte order matters.  This is tricky, use UTF-8 whenever
you can. The default is to use big-endian (most significant byte comes
first):
    name    bytes       char 
    ucs-2         11 22     1122
    ucs-2le       22 11     1122
    ucs-4   11 22 33 44 11223344
    ucs-4le 44 33 22 11 11223344

On MS-Windows systems you often want to use "ucs-2le", because it uses little
endian UCS-2.

There are a few encodings which are similar, but not exactly the same.  Vim
treats them as if they were different encodings, so that conversion will be
done when needed.  You might want to use the similar name to avoid conversion
or when conversion is not possible:

    cp932, shift-jis, sjis
    cp936, euc-cn
Run Code Online (Sandbox Code Playgroud)