我目前正在使用apache poi工作Java项目.现在,在我的项目中,我想将doc文件转换为pdf文件.转换成功完成但我只获得pdf中的文本而不是任何文本样式或文本颜色.我的pdf文件看起来像黑白.虽然我的doc文件是彩色的,并且具有不同的文本样式.
这是我的代码,
POIFSFileSystem fs = null;
Document document = new Document();
try {
System.out.println("Starting the test");
fs = new POIFSFileSystem(new FileInputStream("/document/test2.doc"));
HWPFDocument doc = new HWPFDocument(fs);
WordExtractor we = new WordExtractor(doc);
OutputStream file = new FileOutputStream(new File("/document/test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, file);
Range range = doc.getRange();
document.open();
writer.setPageEmpty(true);
document.newPage();
writer.setPageEmpty(true);
String[] paragraphs = we.getParagraphText();
for (int i = 0; i < paragraphs.length; i++) {
org.apache.poi.hwpf.usermodel.Paragraph pr = range.getParagraph(i);
// CharacterRun run = pr.getCharacterRun(i);
// run.setBold(true);
// run.setCapitalized(true);
// run.setItalic(true); …Run Code Online (Sandbox Code Playgroud) 我有一个简单的要求,提取在MS Word文件中绘制的所有图像和图表.我只能提取图像而不能提取形状组(如用例图或活动图).我想将所有Diagrams保存为图像.
我用过apachePOI.
以下代码我写了
public class worddocreader {
public static void main(String args[]) {
FileInputStream fis;
try {
FileInputStream fs = new FileInputStream("F:/1.docx");
XWPFDocument docx = new XWPFDocument(fs);
List<XWPFPictureData> piclist = docx.getAllPictures();
Iterator<XWPFPictureData> iterator = piclist.iterator();
int i = 0;
while (iterator.hasNext()) {
XWPFPictureData pic = iterator.next();
byte[] bytepic = pic.getData();
BufferedImage imag = ImageIO.read(new ByteArrayInputStream(
bytepic));
ImageIO.write(imag, "image/jpeg", new File("F:/docParsing/imagefromword" + i + ".jpg"));
i++;
}
ArrayList<PackagePart> packArrayList = docx.getPackageRelationship().getPackage().getParts();
int size = packArrayList.size();
System.out.println("Array List Size : " …Run Code Online (Sandbox Code Playgroud) 我正在使用Apache POI。
我可以使用“ org.apache.poi.hwpf.extractor.WordExtractor”从文档文件中读取文本
甚至通过使用“ org.apache.poi.hwpf.usermodel.Table”获取表
但请提出建议,我该如何获取文本的粗体/斜体格式。
提前致谢。
我使用Apache POI使用WordToFoConverter类将.doc转换为.fo,我已将word文件中的图像转换为base64,但如何将其附加到apache-poi生成的xsl-fo代码中?
考虑Apache-POI生成的示例fo文件 -
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="page-page0" page-height="11.0in" page-width="8.5in">
<fo:region-body margin="1.0in 1.0in 1.0in 1.0in"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:declarations>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="">
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">CA, Inc.</dc:creator>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
</fo:declarations>
<fo:page-sequence master-reference="page-page0">
<fo:flow flow-name="xsl-region-body">
<fo:block hyphenate="true" linefeed-treatment="preserve" space-after="10pt" text-align="start" white-space-collapse="false">
***<!--Image link to '0.jpg' can be here-->
<fo:inline font-family="Times New Roman" font-size="11" font-style="normal" font-weight="normal"> </fo:inline>
<!--Image link to '9ab33.png' can be here-->
<fo:leader/>
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
Run Code Online (Sandbox Code Playgroud)
如何在*位置插入图像?