如何在 epub/mobi 文件上运行 grep？

Question

如何在 epub/mobi 文件上运行 grep？

有没有办法做到这一点，特别是在一个目录中的一组多个 epub/mobi 文件上？

Answer 1

通过提供-a将文件解释为 ascii的选项，您可以轻松地 grep 这些文件：

grep -a "author" *.epub *.mobi

Run Code Online (Sandbox Code Playgroud)

以上适用于我所有的 1000 多个 EPUB 和 MOBI 文件，给出了预期的结果。

EPUB 和 MOBI 都是容器格式。EPUB 本质上是一个.zip有一定结构要求的文件，MOBI 是一个 Palm 数据库格式文件。这两种格式都允许将压缩或未压缩的数据放入容器中。

如果您要查找的数据位于容器内的“文件”中，并且该文件已压缩，则您需要提供压缩字符串，而不是该字符串的扩展、未压缩版本。特别是，如果您在电子书阅读器上阅读 EPUB/MOBI，您当然不会grep -a 'abcde'在所有 EPUB 和 MOBI 文件上找到您刚刚阅读的单词“abcde” ，因为这本书的内容很可能（但不一定，它只是一种效率措施）在容器中的压缩“文件”中。

这不是grep无法在这些文件中搜索的问题，而是您没有提供正确的搜索字符串的问题。如果您使用一些日语到英语的翻译软件阅读带有日语文本的文件，然后希望通过 grepping 原始文件找到英语单词，也会发生同样的情况。使用-a正确的日语（二进制）单词模式，效果grep会很好。

Answer 2

mos*_*osh 5

这适用于 windows7+cygwin；在 zip 档案中搜索文本。

c:\> zipgrep "regex" file.epub

Run Code Online (Sandbox Code Playgroud)

c:/cygwin/bin/zipgrep 中的 shell 脚本，这也有效：

c:\> unzip -p "*.epub" | grep -a --color regex

Run Code Online (Sandbox Code Playgroud)

-p 用于管道。

grep-epub.sh 脚本

PAT=${1:?"Usage: grep-epub PAT *.epub files to grep"}
shift
: ${1:?"Need epub files to grep"}
for i in $* ;do
  echo $0 $i
  unzip -p $i "*.htm*" "*.xml" "*.opf" |  # unzip only html and content files to stdin
    perl -lpe 's![<][^>]{1,200}?[>]!!g;' | # get rid of small html <b>tags
    grep -Pinaso  ".{0,60}$PAT.{0,60}" | # keep some context around matches
    grep -Pi --color "$PAT"              # color the matches.
done

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，7 月前
查看次数：	2871 次
最近记录：	7 年，1 月前