Jas*_*ett 8 file-command mime-types
为什么以下不返回text/csv?
$ echo 'foo,bar\nbaz,quux' > temp.csv;file -b --mime temp.csv
text/plain; charset=us-ascii
Run Code Online (Sandbox Code Playgroud)
我使用这个例子是为了更加清晰,但我也遇到了其他 CSV 文件的问题。
$ file -b --mime '/Users/jasonswett/projects/client_work/gd/spec/test_files/wtf.csv'
text/plain; charset=us-ascii
Run Code Online (Sandbox Code Playgroud)
为什么它不认为 CSV 是 CSV?我可以对 CSV 做些什么来file返回“正确”的东西吗?
小智 7
mimetypes 由 unix manpages 称为“magic numbers”的内容决定。在每个文件中都有一个确定文件类型和文件格式的幻数。下面的摘录来自文件命令手册页
The magic number tests are used to check for files with data in partic-
ular fixed formats. The canonical example of this is a binary exe-
cutable (compiled program) a.out file, whose format is defined in
a.out.h and possibly exec.h in the standard include directory. These
files have a 'magic number' stored in a particular place near the
beginning of the file that tells the UNIX operating system that the
file is a binary executable, and which of several types thereof. The
concept of 'magic number' has been applied by extension to data files.
Any file with some invariant identifier at a small fixed offset into
the file can usually be described in this way. The information identi-
fying these files is read from the compiled magic file
/usr/share/file/magic.mgc , or /usr/share/file/magic if the compile
file does not exist. In addition file will look in $HOME/.magic.mgc ,
or $HOME/.magic for magic entries.
Run Code Online (Sandbox Code Playgroud)
unix 手册页还提到,如果文件与幻数不匹配,则文本文件将被视为 ASCII/ISO-8859-x/非 ISO 8 位扩展 ASCII(最适合的格式)
If a file does not match any of the entries in the magic file, it is
examined to see if it seems to be a text file. ASCII, ISO-8859-x, non-
ISO 8-bit extended-ASCII character sets (such as those used on Macin-
tosh and IBM PC systems), UTF-8-encoded Unicode, UTF-16-encoded Uni-
code, and EBCDIC character sets can be distinguished by the different
ranges and sequences of bytes that constitute printable text in each
set. If a file passes any of these tests, its character set is
reported. ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are iden-
tified as ''text'' because they will be mostly readable on nearly any
terminal
Run Code Online (Sandbox Code Playgroud)
建议
使用mimetype命令而不是文件命令
mimetype temp.csv
Run Code Online (Sandbox Code Playgroud)
用于进一步挖掘的网络链接
http://unixhelp.ed.ac.uk/CGI/man-cgi?file
Run Code Online (Sandbox Code Playgroud)
小智 6
不幸的是,您可能无法使文件产生正确的输出。
该file命令根据幻数数据库测试文件的前几个字节。这很容易在二进制文件(如图像或可执行文件)中检查,这些文件在文件开头有一些特定的标识符。
如果文件不是二进制文件,它会检查编码并查找文件中的某些特定单词以确定类型,但仅限于有限数量的文件类型(大多数是编程语言)。
| 归档时间: |
|
| 查看次数: |
5250 次 |
| 最近记录: |