如何使用Swift解析PDF页面中的内容

Tom*_*ers 5 pdf parsing ios swift

文档对我来说并不是很清楚.到目前为止,我认为我需要设置一个CGPDFOperatorTable,然后为每个PDF页面创建一个CGPDFContentStreamCreateWithPage和CGPDFScannerCreate.

该文档涉及设置回调,但我不清楚如何.如何从页面实际获取内容?

到目前为止这是我的代码.

    let pdfURL = NSBundle.mainBundle().URLForResource("titleofdocument", withExtension: "pdf")

    // Create pdf document
    let pdfDoc = CGPDFDocumentCreateWithURL(pdfURL)

    // Nr of pages in this PF
    let numberOfPages = CGPDFDocumentGetNumberOfPages(pdfDoc) as Int

    if numberOfPages <= 0 {
        // The number of pages is zero
        return
    }

    let myTable = CGPDFOperatorTableCreate()

    // lets go through every page
    for pageNr in 1...numberOfPages {

        let thisPage = CGPDFDocumentGetPage(pdfDoc, pageNr)
        let myContentStream = CGPDFContentStreamCreateWithPage(thisPage)
        let myScanner = CGPDFScannerCreate(myContentStream, myTable, nil)

        CGPDFScannerScan(myScanner)

        // Search for Content here?
        // ??

        CGPDFScannerRelease(myScanner)
        CGPDFContentStreamRelease(myContentStream)

    }

    // Release Table
    CGPDFOperatorTableRelease(myTable)
Run Code Online (Sandbox Code Playgroud)

这是一个类似的问题:PDF解析SWIFT但尚无答案.

Dav*_*che 1

实际上,您已经明确指定了如何执行此操作,您所需要做的就是将其放在一起并尝试,直到它起作用为止。

首先,您需要设置一个带有回调的表,正如您在问题开头所陈述的那样(所有代码都在 Objective C 中,而不是 Swift 中):

CGPDFOperatorTableRef operatorTable = CGPDFOperatorTableCreate();
CGPDFOperatorTableSetCallback(operatorTable, "q", &op_q);
CGPDFOperatorTableSetCallback(operatorTable, "Q", &op_Q);
Run Code Online (Sandbox Code Playgroud)

该表包含您想要调用的 PDF 运算符的列表,并将回调与它们相关联。这些回调只是您在其他地方定义的函数:

static void op_q(CGPDFScannerRef s, void *info) {
    // Do whatever you have to do in here
    // info is whatever you passed to CGPDFScannerCreate
}

static void op_Q(CGPDFScannerRef s, void *info) {
    // Do whatever you have to do in here
    // info is whatever you passed to CGPDFScannerCreate
}
Run Code Online (Sandbox Code Playgroud)

然后您创建扫描仪并使其运行,同时向其传递您刚刚定义的信息。

// Passing "self" is just an example, you can pass whatever you want and it will be provided to your callback whenever it is called by the scanner.
CGPDFScannerRef contentStreamScanner = CGPDFScannerCreate(contentStream, operatorTable, self);
Run Code Online (Sandbox Code Playgroud)

CGPDFScannerScan(contentStreamScanner);

如果您想查看有关如何查找和处理图像的完整示例及其源代码,请查看此网站