我正在尝试使用计算机视觉从 pdf/图像发票中提取数据。为此,我使用了基于 ocr 的 pytesseract。\n这是示例发票\n
\n您可以在下面找到相同的代码
import pytesseract\n\n\nimg = Image.open("invoice-sample.jpg")\n\ntext = pytesseract.image_to_string(img)\n\nprint(text)\nRun Code Online (Sandbox Code Playgroud)\n通过使用 pytesseract 我得到以下输出
\nhttp://mrsinvoice.com\n\n \n\n\xe2\x80\x99 Invoice\n\nYour Company LLC Address 123, State, My Country P 111-222-333, F 111-222-334\n\n\nBILLTO:\n\nfofin Oe Invoice # 00001\n\nAlpha Bravo Road 33 Invoice Date 32/12/2001\n\nP: 111-292-333, F: 111-222-334\n\nclient@example.net Nomecof Reps Bob\nContact Phone 101-102-103\n\nSHIPPING TO:\n\neine ce Payment Terms ash on Delivery\n\nOffice Road 38\nP: 111-333-222, F: 122-222-334 Amount Due: $4,170\noffice@example.net\n\nNO PRODUCTS / SERVICE QUANTITY / RATE / UNIT AMOUNT\nHOURS: PRICE\n\n1 tye 2 $20 $40\n\n2__| Steering …Run Code Online (Sandbox Code Playgroud) python ocr tesseract python-imaging-library document-layout-analysis