Win32com使用Acrobat Pro X将PDF保存为XML> com_error"-2147467263,'未实现'"

Question

Win32com使用Acrobat Pro X将PDF保存为XML> com_error"-2147467263,'未实现'"

use*_*272 3 windows acrobat pywin32 python-2.7

在Win32上的Python 2.7(r27:82525,2010年7月4日,09:01:59)[MSC v.1500 32位(英特尔)]
Windows XP SP3
Python 2.7 pywin32-218
Adobe Acrobat X 10.0.0

我想使用Python自动化Acrobat Pro将PDF导出为XML.我已经使用正在运行的程序中的"另存为"对话框手动尝试了它,现在想通过Python脚本来完成它.我已经阅读了很多页面,包括Adobe SDK,SDK论坛,VB论坛的部分内容,我没有运气.

我在这里阅读Blish的问题:"未实现"使用pywin32控制Adobe Acrobat时出现异常

这个页面:timgolden python/win32_how_do_i/generate-a-static-com-proxy.html

我错过了什么.我的代码是:

import win32com.client
import win32com.client.makepy

win32com.client.makepy.GenerateFromTypeLibSpec('Acrobat')
adobe = win32com.client.DispatchEx('AcroExch.App')
avDoc = win32com.client.DispatchEx('AcroExch.AVDoc')
avDoc.Open('C:\Documents and Settings\PC\Desktop\a_PDF.pdf', 'C:\Documents and Settings\PC\Desktop')
pdDoc = avDoc.GetPDDoc()
jObject = pdDoc.GetJSObject()
jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00")

Run Code Online (Sandbox Code Playgroud)

完整的错误是:

Traceback (most recent call last):
  File "<pyshell#31>", line 1, in <module>
    jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00")
  File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 511, in __getattr__
    ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1)
com_error: (-2147467263, 'Not implemented', None, None)

Run Code Online (Sandbox Code Playgroud)

我猜它与make.py有关,但我不明白如何在我的代码中实现它.

我从我的代码中删除了这一行,并在运行时遇到了同样的错误:

win32com.client.makepy.GenerateFromTypeLibSpec('Acrobat')

Run Code Online (Sandbox Code Playgroud)

然后我将这两行从'DispatchEX'更改为'Dispatch'并出现同样的错误:

adobe = win32com.client.Dispatch('AcroExch.App')
avDoc = win32com.client.Dispatch('AcroExch.AVDoc')

Run Code Online (Sandbox Code Playgroud)

当我自己运行Dispatches然后再回调它时,我得到:

>>> adobe = win32com.client.DispatchEx('AcroExch.App')
>>> adobe
<win32com.gen_py.Adobe Acrobat 10.0 Type Library.CAcroApp instance at 0x18787784>
>>> avDoc = win32com.client.Dispatch('AcroExch.AVDoc')
>>> avDoc
<win32com.gen_py.Adobe Acrobat 10.0 Type Library.CAcroAVDoc instance at 0x20365224>

Run Code Online (Sandbox Code Playgroud)

这是否意味着我应该只调用一次Dispatch？我拉了:

adobe = win32com.client.Dispatch('AcroExch.App')

Run Code Online (Sandbox Code Playgroud)

并得到了同样的错误.

这个Adobe网站说:

AVDoc    
Product availability: Acrobat, Reader
Platform availability: Macintosh, Windows, UNIX
Syntax
typedef struct _t_AVDoc* AVDoc;

A view of a PDF document in a window. There is one AVDoc per displayed document. Unlike a PDDoc, an AVDoc has a window associated with it.

Run Code Online (Sandbox Code Playgroud)

acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/API_References/Acrobat_API_Reference/AV_Layer/AVDoc.html#AVDocSaveParams

PDDoc页面说:

A PDDoc object represents a PDF document. There is a correspondence between a PDDoc and an ASFile. Also, every AVDoc has an associated PDDoc, although a PDDoc may not be associated with an AVDoc.

Run Code Online (Sandbox Code Playgroud)

/9.1/Acrobat9_1_HTMLHelp/API_References/Acrobat_API_Reference/PD_Layer/PDDoc.html

我尝试了以下代码,也得到了同样的错误:

import win32com.client
import win32com.client.makepy

pdDoc = win32com.client.Dispatch('AcroExch.PDDoc')
pdDoc.Open('C:\Documents and Settings\PC\Desktop\a_PDF.pdf')
jObject = pdDoc.GetJSObject()
jObject.SaveAs('C:\Documents and Settings\PC\Desktop\a_PDF.xml', "com.adobe.acrobat.xml-1-00")

Run Code Online (Sandbox Code Playgroud)

如果我改变同样的错误:

pdDoc = win32com.client.Dispatch('AcroExch.PDDoc')

Run Code Online (Sandbox Code Playgroud)

至

pdDoc = win32com.client.gencache.EnsureDispatch('AcroExch.PDDoc')

Run Code Online (Sandbox Code Playgroud)

像这里:win32com.client.Dispatch工作但不是win32com.client.gencache.EnsureDispatch

Answer 1

小智 5

user2993272,你几乎就在那里:只有一行,你所拥有的代码应该完美无缺.

我将尝试以与您的问题相同的精神回答并尽可能多地为您提供详细信息.

该主题是您正在寻找的解决方案的关键:https://mail.python.org/pipermail/python-win32/2002-March/000260.html

我承认这篇文章不是最容易找到的(也许谷歌根据内容的年龄将其评分为低？).

具体来说,应用这条建议将为您提供服务:https://mail.python.org/pipermail/python-win32/2002-March/000265.html

为了完整起见,这段代码应该完成工作,而不需要你手动修补dynamic.py(代码片段应该开箱即用):

# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
from win32com.client import Dispatch
from win32com.client.dynamic import ERRORS_BAD_CONTEXT

import winerror

# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
try:
    from scandir import walk
except ImportError:
    from os import walk

import fnmatch

import sys
import os

ROOT_INPUT_PATH = None
ROOT_OUTPUT_PATH = None
INPUT_FILE_EXTENSION = "*.pdf"
OUTPUT_FILE_EXTENSION = ".txt"

def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
    avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat

    # Open the input file (as a pdf)
    ret = avDoc.Open(f_path, f_path)
    assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?

    pdDoc = avDoc.GetPDDoc()

    dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))

    # Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
    jsObject = pdDoc.GetJSObject()

    # Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
    jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext")

    pdDoc.Close()
    avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
    del pdDoc

if __name__ == "__main__":
    assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension>

    #$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt'

    ROOT_INPUT_PATH = sys.argv[1]
    INPUT_FILE_EXTENSION = sys.argv[2]
    ROOT_OUTPUT_PATH = sys.argv[3]
    OUTPUT_FILE_EXTENSION = sys.argv[4]

    # tuples are of schema (path_to_file, filename)
    matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION))

    # Magic piece of code that should get everything working for you!
    # patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html
    global ERRORS_BAD_CONTEXT
    ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)

    for filename_with_path, filename_without_extension in matching_files:
        print "Processing '{}'".format(filename_without_extension)
        acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION)

Run Code Online (Sandbox Code Playgroud)

我在WinPython x64 2.7.6.3,Acrobat X Pro上测试了这个

归档时间：	12 年，3 月前
查看次数：	1810 次
最近记录：	11 年，4 月前