如何处理grep中的CRLF行结尾？

Question

如何处理grep中的CRLF行结尾？

假设我有一个包含 CRLF 行结尾的任意文本输入：

$ curl -sI http://unix.stackexchange.com | head -4
HTTP/1.1 200 OK
Cache-Control: public, max-age=60
Content-Length: 80551
Content-Type: text/html; charset=utf-8

$ curl -sI http://unix.stackexchange.com | head -4 | hexdump -C
00000000  48 54 54 50 2f 31 2e 31  20 32 30 30 20 4f 4b 0d  |HTTP/1.1 200 OK.|
00000010  0a 43 61 63 68 65 2d 43  6f 6e 74 72 6f 6c 3a 20  |.Cache-Control: |
00000020  70 75 62 6c 69 63 2c 20  6d 61 78 2d 61 67 65 3d  |public, max-age=|
00000030  36 30 0d 0a 43 6f 6e 74  65 6e 74 2d 4c 65 6e 67  |60..Content-Leng|
00000040  74 68 3a 20 38 30 39 30  32 0d 0a 43 6f 6e 74 65  |th: 80902..Conte|
00000050  6e 74 2d 54 79 70 65 3a  20 74 65 78 74 2f 68 74  |nt-Type: text/ht|
00000060  6d 6c 3b 20 63 68 61 72  73 65 74 3d 75 74 66 2d  |ml; charset=utf-|
00000070  38 0d 0a                                          |8..|
00000073

Run Code Online (Sandbox Code Playgroud)

GNU grep2.26 不能很好地处理这样的关于行结尾的输入：

$ curl -sI http://unix.stackexchange.com | head -4 | grep '200 OK$'
$ curl -sI http://unix.stackexchange.com | head -4 | grep '200 OK.$'
HTTP/1.1 200 OK

Run Code Online (Sandbox Code Playgroud)

这有点烦人。我当然可以通过包含dos2unix到管道中来解决这个问题：

$ curl -sI http://unix.stackexchange.com | head -4 | dos2unix | grep '200 OK$'
HTTP/1.1 200 OK

Run Code Online (Sandbox Code Playgroud)

但这感觉有点笨拙（而且不是很便携）。

~~奇怪的是，grep(2)手册页声称该工具将删除输入中的任何 CR，除非输入已被检测为二进制：~~

-U, --binary Treat the file(s) as binary. By default, under MS-DOS and MS-Windows, grep guesses whether a file is text or binary as described for the --binary-files option. If grep decides the file is a text file, it strips the CR characters from the original file contents (to make regular expressions with ^ and $ work correctly). Specifying -U overrules this guesswork, causing all files to be read and passed to the matching mechanism verbatim; if the file is a text file with CR/LF pairs at the end of each line, this will cause some regular expressions to fail. This option has no effect on platforms other than MS-DOS and MS-Windows.
Run Code Online (Sandbox Code Playgroud)

编辑：如联机帮助页中所述，此行为特定于 MS-DOS 和 MS-Windows。

~~是否可以在grep不预处理输入的情况下透明地处理 CRLF（和 CR）行结尾？如果不是，这是应该修补的东西，还是有充分的理由？~~

Answer 1

小智 3

基于此页面。尝试这些解决方案

/sf/ask/5168341/

curl -sI http://unix.stackexchange.com | head -4  | grep "200 OK$(printf '\r')" 

grep -IUlr $'\r'

Run Code Online (Sandbox Code Playgroud)

我当然可以使用 `$(printf '\r')` – 或 bash 中的 `$'\r'` – 将文字 CR 插入到模式中。然而，我要问的是，是否有办法让我不必这样做。我想透明地匹配行结尾（即无论它们是否由 CR、LF 或 CRLF 组成）。 (2认同)

归档时间：	9 年前
查看次数：	6830 次
最近记录：	9 年前