wc 命令计算额外的字符

Jen*_*yen 7 linux osx wc

cat > file
Amy looked at her watch. He was late. The sun was setting but Jake didn’t care.

wc file
1      16      82 file
Run Code Online (Sandbox Code Playgroud)

有人可以解释为什么wc命令在这种情况下返回 3 个额外的字符吗?

tec*_*raf 36

wc多显示 3 个字符,因为您的示例文件包含一个花哨的 Unicode 撇号(很可能是因为您从浏览器或文本编辑器复制了内容):

$ cat file
Amy looked at her watch. He was late. The sun was setting but Jake didn’t care.
$ wc file
1      16      82 file
Run Code Online (Sandbox Code Playgroud)

使用纯 ASCII 撇号'

$ cat file2
Amy looked at her watch. He was late. The sun was setting but Jake didn't care.
$ wc file
1      16      80 file2
Run Code Online (Sandbox Code Playgroud)

wc默认情况下显示每个手册的字节

每个文件的换行符、单词和字节数

对于字符计数,-m可以使用一个参数:

$ cat file
Amy looked at her watch. He was late. The sun was setting but Jake didn’t care.
$ wc -m file
      80 file.txt
Run Code Online (Sandbox Code Playgroud)

  • 是的,因为 `wc` 计算 *bytes*,而不是 *characters*。http://pubs.opengroup.org/onlinepubs/009604499/utilities/wc.html (7认同)
  • 使用 `wc -m` 计算字符,`wc -c` 和 `wc` 中的第 3 列输出计数字节,而不是字符。 (5认同)

Rab*_*bin 12

通过管道传输文件以并排xxd查看 ascii 的十六进制输出,这将让您查看是否有您看不到或无法打印的额外字符。

$ cat file
one? and ?two

$ cat file | wc
      1       3      18

$ cat file | xxd
00000000: 6f6e 65e2 808f 2061 6e64 20e2 808f 7477  one... and ...tw
00000010: 6f0a                                     o.
Run Code Online (Sandbox Code Playgroud)