如何使用apachePOI从Word文档(.doc或.docx)中读取形状组作为图像?

Kar*_*san 6 java hwpf apache-poi xwpf

我有一个简单的要求,提取在MS Word文件中绘制的所有图像和图表.我只能提取图像而不能提取形状组(如用例图或活动图).我想将所有Diagrams保存为图像.

我用过apachePOI.

以下代码我写了

public class worddocreader {
public static void main(String args[]) {
    FileInputStream fis;
    try {
        FileInputStream fs = new FileInputStream("F:/1.docx");
        XWPFDocument docx = new XWPFDocument(fs);
        List<XWPFPictureData> piclist = docx.getAllPictures();
        Iterator<XWPFPictureData> iterator = piclist.iterator();
        int i = 0;
        while (iterator.hasNext()) {
            XWPFPictureData pic = iterator.next();
            byte[] bytepic = pic.getData();
            BufferedImage imag = ImageIO.read(new ByteArrayInputStream(
                    bytepic));
            ImageIO.write(imag, "image/jpeg", new File("F:/docParsing/imagefromword" + i + ".jpg"));
            i++;
        }

        ArrayList<PackagePart> packArrayList = docx.getPackageRelationship().getPackage().getParts();
        int size = packArrayList.size();
        System.out.println("Array List Size : " + packArrayList.size());

        while (size-->0) {
            PackagePart packagePart = packArrayList.get(size);

            System.out.println(packagePart.getContentType());

            try{
                BufferedImage bfrImage = ImageIO.read(packagePart.getInputStream());
                ImageIO.write(bfrImage,"image/png",new File("F:/docParsing_emb/size"+size+".png"));
            }catch(Exception e){
                e.printStackTrace();
            }
        }
        System.out.println("Done");
    } catch (Exception e) {
        e.printStackTrace();
    }
}
Run Code Online (Sandbox Code Playgroud)

}

它只提取图像而不是形状.

有谁知道我该怎么做?

小智 0

如果你指的是办公艺术品那么

在 org.apache.poi.hwpf.HWPFDocument 类中,有一个 _officeDrawingsMain 包含 Office 艺术对象

检查此链接https://poi.apache.org/apidocs/org/apache/poi/hwpf/HWPFDocument.html