xia*_*amx 24 python epub ibooks
我正在尝试在python中为iBook创建一个epub上传器.我需要一个python lib来提取书籍信息.在我自己实现之前,我想知道是否有人知道已经制作的python库.
Hug*_*ell 40
.epub文件是一个包含META-INF目录的zip编码文件,其中包含一个名为container.xml的文件,该文件指向另一个通常名为Content.opf的文件,该文件对构成电子书的所有其他文件编制索引. (摘要基于http://www.jedisaber.com/eBooks/tutorial.asp ;完整规范,网址为http://www.idpf.org/2007/opf/opf2.0/download/)
以下Python代码将从.epub文件中提取基本元信息并将其作为dict返回.
import zipfile
from lxml import etree
def get_epub_info(fname):
ns = {
'n':'urn:oasis:names:tc:opendocument:xmlns:container',
'pkg':'http://www.idpf.org/2007/opf',
'dc':'http://purl.org/dc/elements/1.1/'
}
# prepare to read from the .epub file
zip = zipfile.ZipFile(fname)
# find the contents metafile
txt = zip.read('META-INF/container.xml')
tree = etree.fromstring(txt)
cfname = tree.xpath('n:rootfiles/n:rootfile/@full-path',namespaces=ns)[0]
# grab the metadata block from the contents metafile
cf = zip.read(cfname)
tree = etree.fromstring(cf)
p = tree.xpath('/pkg:package/pkg:metadata',namespaces=ns)[0]
# repackage the data
res = {}
for s in ['title','language','creator','date','identifier']:
res[s] = p.xpath('dc:%s/text()'%(s),namespaces=ns)[0]
return res
Run Code Online (Sandbox Code Playgroud)
样本输出:
{
'date': '2009-12-26T17:03:31',
'identifier': '25f96ff0-7004-4bb0-b1f2-d511ca4b2756',
'creator': 'John Grisham',
'language': 'UND',
'title': 'Ford County'
}
Run Code Online (Sandbox Code Playgroud)