如何从 Java 调用 tabula (JAR)？

Question

如何从 Java 调用 tabula (JAR)？

emd*_*emd 3 java tabula

Tabula 看起来像是从 PDF 中提取表格数据的好工具。有很多关于如何从命令行调用它或在 Python 中使用它的示例，但似乎没有任何可用于 Java 的文档。有没有人有一个有效的例子？

请注意，tabula 确实提供了源代码，但在版本之间似乎很混乱。例如，GitHub 上的示例引用了 JAR 中似乎不存在的 TableExtractor 类。

https://github.com/tabulapdf/tabula-java

Answer 1

小智 6

you can use the following code to call tabula from java, hope this helps

  public static void main(String[] args) throws IOException {
    final String FILENAME="../test.pdf";

    PDDocument pd = PDDocument.load(new File(FILENAME));

    int totalPages = pd.getNumberOfPages();
    System.out.println("Total Pages in Document: "+totalPages);

    ObjectExtractor oe = new ObjectExtractor(pd);
    SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
    Page page = oe.extract(1);

    // extract text from the table after detecting
    List<Table> table = sea.extract(page);
    for(Table tables: table) {
        List<List<RectangularTextContainer>> rows = tables.getRows();

        for(int i=0; i<rows.size(); i++) {

            List<RectangularTextContainer> cells = rows.get(i);

            for(int j=0; j<cells.size(); j++) {
                System.out.print(cells.get(j).getText()+"|");
            }

           // System.out.println();
        }
    }

}

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，6 月前
查看次数：	1943 次
最近记录：	5 年，6 月前