标签: cloud-document-ai

Firebase 部署失败 - 找不到functions.yaml。必须使用http发现

我正在尝试部署 firebase 云功能，但不断收到此错误。最奇怪的部分是，我让它工作正常，但从 firebase 与云视觉对话切换到 firebase 与 google 文档对话。突然，这个错误出现了。我已经尝试了几个不同版本的 firebase 工具和 NodeJS，但尚未能够解决该问题。这是下面的错误。

[2023-02-28T17:29:15.917Z] Building nodejs source
[2023-02-28T17:29:15.922Z] Could not find functions.yaml. Must use http discovery
[2023-02-28T17:29:15.935Z] Found firebase-functions binary at 'C:\Users\crisb\source\repos\Javascriptcouldfunction4\functions\node_modules\.bin\firebase-functions'
[2023-02-28T17:29:17.570Z] Serving at port 8704

[2023-02-28T17:29:19.519Z] Got response from /__/functions.yaml Failed to generate manifest from function source: TypeError [ERR_INVALID_ARG_TYPE]: The "id" argument must be of type string. Received an instance of Object
[2023-02-28T17:29:19.522Z] Failed to parse functions.yamlincomplete explicit mapping pair; a key node is missed; or followed by a …

Run Code Online (Sandbox Code Playgroud)

javascript firebase cloud-document-ai

cri*_*unt

2023 08-06

6
推荐指数

1
解决办法

3371
查看次数

Google Document Ai 为同一文件提供不同的输出

我使用 Document OCR API 从 pdf 文件中提取文本，但部分内容不准确。我发现原因可能是因为一些汉字的存在。

以下是我虚构的示例，其中我裁剪了提取文本错误的部分区域，并添加了一些汉字来重现该问题。

输入文件

当我使用网站版本时，我无法获取汉字，但其余字符是正确的。

网站版本OCR结果

当我使用Python提取文本时，我可以正确地获取汉字，但剩余的部分字符是错误的。

程序结果

我得到的实际字符串。

实际结果

网站和API中的Document AI版本是否不同？如何正确获取所有字符？

更新：

当我在打印文本后打印detected_languages（不知道为什么 for lines = page.lines，detected_languagesfor 两行都是空列表，需要更改为page.blocks或page.paragraphs首先）时，我得到以下输出。

语言代码

代码：

from google.cloud import documentai_v1beta3 as documentai

project_id= 'secret-medium-xxxxxx'
location = 'us' # Format is 'us' or 'eu'
processor_id = 'abcdefg123456' #  Create processor in Cloud Console

opts = {}
if location == "eu":
    opts = {"api_endpoint": "eu-documentai.googleapis.com"}
client = documentai.DocumentProcessorServiceClient(client_options=opts)

def get_text(doc_element: dict, document: dict):
    """
    Document AI …

Run Code Online (Sandbox Code Playgroud)

python ocr google-api-python-client google-cloud-platform cloud-document-ai

ite*_*r07

2021 08-18

5
推荐指数

1
解决办法

989
查看次数

Google DocumentAI Java 示例失败，并出现 io.grpc.StatusRuntimeException：INVALID_ARGUMENT：请求包含无效参数

我浪费了几个小时尝试https://cloud.google.com/document-ai/docs/quickstart-client-libraries中的Google Document AI java示例

如果您像这样输入您的项目 ID、位置和处理器 ID

        String projectId = "6493xxxxxxxx";
        String location = "eu";
        String processorId = "74451xxxxxx";
        String filePath = "/Users/schube/Desktop/file.pdf";

Run Code Online (Sandbox Code Playgroud)

并运行该示例，您只会得到一个InvalidArgumentException：

Exception in thread "main" com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Request contains an invalid argument.
    at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:49)
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
    at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
    at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
    at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
    at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1133)
    at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
    at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1277)
    at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:1038)
    at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:808)
    at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:563)
    at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)
    at io.grpc.internal.DelayedClientCall$DelayedListener$3.run(DelayedClientCall.java:463)
    at io.grpc.internal.DelayedClientCall$DelayedListener.delayOrExecute(DelayedClientCall.java:427)
    at io.grpc.internal.DelayedClientCall$DelayedListener.onClose(DelayedClientCall.java:460)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:557)
    at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:69)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:738)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:717)
    at …

Run Code Online (Sandbox Code Playgroud)

java google-cloud-platform cloud-document-ai

sch*_*ube

lucky-day

5
推荐指数

1
解决办法

4891
查看次数

由于已解决的错误，Google Document AI 训练失败

我正在使用 Google 的Document AI训练模型。训练失败并出现以下错误（为简单起见，我仅包含 JSON 文件的一部分，但该错误对于数据集中的所有文档都是相同的）：

"trainingDatasetValidation": {
      "documentErrors": [
        {
          "code": 3,
          "message": "Invalid document.",
          "details": [
            {
              "@type": "type.googleapis.com/google.rpc.ErrorInfo",
              "reason": "INVALID_DOCUMENT",
              "domain": "documentai.googleapis.com",
              "metadata": {
                "num_fields": "0",
                "num_fields_needed": "1",
                "document": "5e88c5e4cc05ddb8.json",
                "annotation_name": "INCOME_ADJUSTMENTS",
                "field_name": "entities.text_anchor.text_segments"
              }
            }
          ]
        }

Run Code Online (Sandbox Code Playgroud)

我从这个错误中了解到的是，模型期望该字段INCOME_ADJUSTMENTS在文档中（至少）出现一次，但它发现它的实例为零。

这是可以理解的，除非我已经INCOME_ADJUSTMENTS在模式中将该字段定义为“可选一次”，即该字段可以出现零次或一次。

我错过了什么吗？尽管该错误已在架构中得到解决，但为什么该错误仍然存在？

ps 我还尝试过“可选多个”（以及“必需一次”和“必需多个”），但错误仍然存在。

编辑：根据要求，以下是其中一个 JSON 文件的样子。请注意，此处没有 PII，因为详细信息（姓名、SSN 等）是合成数据。

google-cloud-platform cloud-document-ai

Ave*_*nus

2023 01-14

5
推荐指数

1
解决办法

924
查看次数

什么参数对于 Node.js 的 Google Document AI 客户端库无效？

我正在尝试使用 Node.js 应用程序运行 Google 的文档 OCR。所以我使用了 Node JavaScript 的客户端库@google-cloud/documentai

我做了像文档示例中那样的一切

有我的代码

const projectId = '*******';
const location = 'eu'; // Format is 'us' or 'eu'
const processor = '******'; // Create processor in Cloud Console
const keyFilename = './secret/******.json';

const { DocumentProcessorServiceClient } = require('@google-cloud/documentai').v1beta3;

const client = new DocumentProcessorServiceClient({projectId, keyFilename});

async function start(encodedImage) {

  console.log("Google AI Started")
  const name = `projects/${projectId}/locations/${location}/processors/${processor}`;

  const request = {
    name,
    document: {
      content: encodedImage,
      mimeType: 'application/pdf',
    },
  }

  try {
    const …

Run Code Online (Sandbox Code Playgroud)

node.js google-cloud-platform google-ai-platform cloud-document-ai

nik*_*gan

2023 04-14

3
推荐指数

1
解决办法

2121
查看次数

根据 Form_Parser 的 GCP 教程从 Cloud SDK Interactive python（缩写 ipython 或 Ipython）调用 Document AI v1beta3 时权限被拒绝

我正在关注https://codelabs.developers.google.com/codelabs/docai-form-parser-v3-python#7上的教程，我遵循了他们指定的所有步骤......

我按照教程中指定的方式使用 Cloud SDK 进行开发，但随后

他们给出的代码如下：

project_id= 'YOUR_PROJECT_ID' 
location = 'YOUR_PROJECT_LOCATION' # Format is 'us' or 'eu'
processor_id = 'YOUR_PROCESSOR_ID' # Create processor in Cloud Console
file_path = 'form.pdf' # The local file in your current working directory

from google.cloud import documentai_v1beta3 as documentai
from google.cloud import storage

def process_document(
    project_id=project_id, location=location, processor_id=processor_id,  file_path=file_path
):

    # Instantiates a client
    client = documentai.DocumentProcessorServiceClient()

    # The full resource name of the processor, e.g.:
    # projects/project-id/locations/location/processor/processor-id
    # You must create new processors …

Run Code Online (Sandbox Code Playgroud)

google-cloud-platform cloud-document-ai

Cod*_*e99

2021 05-27

3
推荐指数

1
解决办法

1194
查看次数

如何将“google.cloud.documentai_v1.types.document”对象转换为json

我正在使用 Google Cloud Document AI 的发票解析器。API响应是google.cloud.documentai_v1.types.Document对象。我尝试编写以下方法将其转换为 JSON，但没有任何效果：

json.dumps() 但它给出了 JSONDecodeError
google.cloud.documentai_v1.Document .to_json()

python google-cloud-platform cloud-document-ai

kus*_*gra

2021 06-29

3
推荐指数

1
解决办法

2144
查看次数

文档 AI：google.api_core.exceptions.InvalidArgument：400 请求包含无效参数

尝试在 python 中从谷歌云实现文档 OCR 时出现此错误，如下所述：https : //cloud.google.com/document-ai/docs/ocr

当我跑

   result = client.process_document(request=request)

Run Code Online (Sandbox Code Playgroud)

我收到这个错误

Traceback (most recent call last):
  File "/Users/Niolo/Desktop/untitled/Desktop/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 73, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/Niolo/Desktop/untitled/Desktop/lib/python3.8/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/Niolo/Desktop/untitled/Desktop/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "Request contains an invalid argument."
    debug_error_string = "{"created":"@1614769280.332675000","description":"Error received from peer ipv4:142.250.180.138:443","file":"src/core/lib/surface/call.cc","file_line":1068,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>
The above exception was the …

Run Code Online (Sandbox Code Playgroud)

python google-cloud-platform cloud-document-ai

Meg*_*d45

2021 03-04

2
推荐指数

1
解决办法

894
查看次数