相关疑难解决方法(0)

我在哪里可以将Identity-H编码字符映射到ASCII或Unicode字符?

我有一个由第三方生成的PDF.我试图从中获取文本,但是pdf2text复制和粘贴都不会产生可读文本.在对输出(两个中的任何一个)进行一点挖掘之后,我发现屏幕上的每个字符都由三个字节组成.例如,"A"是字节ef,8181.查看PDF上的元数据,它声称在Identity-H中编码,所以我假设我看到的是一组用Identity-H编码的字符.我有一个基于我已经拥有的文档的部分映射,但我想做一个更完整的映射.要做到这一点,我需要像Identity-H的ASCII表.

pdf unicode encoding text character-encoding

11
推荐指数
1
解决办法
2万
查看次数

struct.error:unpack需要长度为16的字符串参数

使用pdfminer(pdf2txt.py)处理PDF 文件(2.pdf)时收到以下错误:

pdf2txt.py 2.pdf 

Traceback (most recent call last):
  File "/usr/local/bin/pdf2txt.py", line 115, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "/usr/local/bin/pdf2txt.py", line 109, in main
    interpreter.process_page(page)
  File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 843, in render_contents
    self.init_resources(resources)
  File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 347, in init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 195, in get_font
    font = self.get_font(None, subspec)
  File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 186, in get_font
    font = PDFCIDFont(self, spec)
  File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 654, in __init__ …
Run Code Online (Sandbox Code Playgroud)

python pdf pdf-parsing pdftotext pdfminer

7
推荐指数
1
解决办法
3984
查看次数