AWS Textract - UnsupportedDocumentException - PDF

Question

AWS Textract - UnsupportedDocumentException - PDF

gmw*_*934 6 python amazon-web-services boto3 amazon-textract

我正在使用 boto3（适用于 python 的 aws sdk）来分析文档（pdf）以获取表单键：值对。

import boto3

def process_text_analysis(bucket, document):
    # Get the document from S3
    s3_connection = boto3.resource('s3')
    s3_object = s3_connection.Object(bucket, document)
    s3_response = s3_object.get()
    # Analyze the document
    client = boto3.client('textract')
    response = client.analyze_document(Document={'S3Object': {'Bucket': bucket, 'Name': document}},
                                       FeatureTypes=["FORMS"])


process_text_analysis('francismorgan-01', '709 Privado M SURESTE.pdf')

Run Code Online (Sandbox Code Playgroud)

我已使用分析文档遵循 AWS 文档，当我运行我的函数时，我收到错误。

botocore.errorfactory.UnsupportedDocumentException: An error occurred (UnsupportedDocumentException) when calling the AnalyzeDocument operation: Request has unsupported document format

Run Code Online (Sandbox Code Playgroud)

我错过了什么吗？

Answer 1

syu*_*maK 11

AnalyticsDocument是一个同步 API，仅支持 PNG 或 JPG 图像。

由于您想要处理 PDF 文件，因此您需要使用 Amazon Textract 异步 API，例如StartDocumentAnalysis、StartDocumentTextDetection

归档时间：	5 年，12 月前
查看次数：	7066 次
最近记录：	4 年，3 月前