资讯专栏INFORMATION COLUMN

Java中常用的几种DOCX转PDF方法

zgbgx / 2680人阅读

摘要:而利用接口进行读取与生成的方式性能较好,适用于对于格式要求不是很高的情况。

本文从属于笔者的Java入门与最佳实践系列文章。

DOCX2PDF

将DOCX文档转化为PDF是项目中常见的需求之一,目前主流的方法可以分为两大类,一类是利用各种Office应用进行转换,譬如Microsoft Office、WPS以及LiberOffice,另一种是利用各种语言提供的对于Office文档读取的接口(譬如Apache POI)然后使用专门的PDFGenerator库,譬如IText进行PDF构建。总的来说,从样式上利用Office应用可以保证较好的样式,不过相对而言效率会比较低。其中Microsoft Office涉及版权,不可轻易使用(笔者所在公司就被抓包了),WPS目前使用比较广泛,不过存在超链接截断问题,即超过256个字符的超链接会被截断,LiberOffice的样式排版相对比较随意。而利用POI接口进行读取与生成的方式性能较好,适用于对于格式要求不是很高的情况。另外还有一些封装好的在线工具或者命令行工具,譬如docx2pdf与OfficeToPDF。

MicroSoft Office

本部分的核心代码如下,全部代码参考这里:

private ActiveXComponent oleComponent = null;
private Dispatch activeDoc = null;
private final static String APP_ID = "Word.Application";

// Constants that map onto Word"s WdSaveOptions enumeration and that
// may be passed to the close(int) method
public static final int DO_NOT_SAVE_CHANGES = 0;
public static final int PROMPT_TO_SAVE_CHANGES = -2;
public static final int SAVE_CHANGES = -1;

// These constant values determine whether or not tha application
// instance will be displyed on the users screen or not.
public static final boolean VISIBLE = true;
public static final boolean HIDDEN = false;

/**
 * Create a new instance of the JacobWordSearch class using the following
 * parameters.
 *
 * @param visibility A primitive boolean whose value will determine whether
 *                   or not the Word application will be visible to the user. Pass true
 *                   to display Word, false otherwise.
 */
public OfficeConverter(boolean visibility) {
    this.oleComponent = new ActiveXComponent(OfficeConverter.APP_ID);
    this.oleComponent.setProperty("Visible", new Variant(visibility));
}

/**
 * Open ana existing Word document.
 *
 * @param docName An instance of the String class that encapsulates the
 *                path to and name of a valid Word file. Note that there are a few
 *                limitations applying to the format of this String; it must specify
 *                the absolute path to the file and it must not use the single forward
 *                slash to specify the path separator.
 */
public void openDoc(String docName) {
    Dispatch disp = null;
    Variant var = null;
    // First get a Dispatch object referencing the Documents collection - for
    // collections, think of ArrayLists of objects.
    var = Dispatch.get(this.oleComponent, "Documents");
    disp = var.getDispatch();
    // Now call the Open method on the Documents collection Dispatch object
    // to both open the file and add it to the collection. It would be possible
    // to open a series of files and access each from the Documents collection
    // but for this example, it is simpler to store a reference to the
    // active document in a private instance variable.
    var = Dispatch.call(disp, "Open", docName);
    this.activeDoc = var.getDispatch();
}

/**
 * There is more than one way to convert the document into PDF format, you
 * can either explicitly use a FileConvertor object or call the
 * ExportAsFixedFormat method on the active document. This method opts for
 * the latter and calls the ExportAsFixedFormat method passing the name
 * of the file along with the integer value of 17. This value maps onto one
 * of Word"s constants called wdExportFormatPDF and causes the application
 * to convert the file into PDF format. If you wanted to do so, for testing
 * purposes, you could add another value to the args array, a Boolean value
 * of true. This would open the newly converted document automatically.
 *
 * @param filename
 */
public void publishAsPDF(String filename) {
    // The code to expoort as a PDF is 17
    //Object args = new Object{filename, new Integer(17), new Boolean(true)};
    Object args = new Object {
        filename, new Integer(17)
    } ;
    Dispatch.call(this.activeDoc, "ExportAsFixedFormat", args);
}

/**
 * Called to close the active document. Note that this method simply
 * calls the overloaded closeDoc(int) method passing the value 0 which
 * instructs Word to close the document and discard any changes that may
 * have been made since the document was opened or edited.
 */
public void closeDoc() {
    this.closeDoc(JacobWordSearch.DO_NOT_SAVE_CHANGES);
}

/**
 * Called to close the active document. It is possible with this overloaded
 * version of the close() method to specify what should happen if the user
 * has made changes to the document that have not been saved. There are three
 * possible value defined by the following manifest constants;
 * DO_NOT_SAVE_CHANGES - Close the document and discard any changes
 * the user may have made.
 * PROMPT_TO_SAVE_CHANGES - Display a prompt to the user asking them
 * how to proceed.
 * SAVE_CHANGES - Save the changes the user has made to the document.
 *
 * @param saveOption A primitive integer whose value indicates how the close
 *                   operation should proceed if the user has made changes to the active
 *                   document. Note that no checks are made on the value passed to
 *                   this argument.
 */
public void closeDoc(int saveOption) {
    Object args = {new Integer(saveOption)};
    Dispatch.call(this.activeDoc, "Close", args);
}

/**
 * Called once processing has completed in order to close down the instance
 * of Word.
 */
public void quit() {
    Dispatch.call(this.oleComponent, "Quit");
}
WPS

Java调用WPS或pdfcreator的com接口实现doc转pdf

本文的核心代码如下,完整代码查看这里:

        @Override
        public boolean convert(String word, String pdf) {
            File pdfFile = new File(pdf);
            File wordFile = new File(word);
            boolean convertSuccessfully = false;

            ActiveXComponent wps = null;
            ActiveXComponent doc = null;


            try {
                wps = new ActiveXComponent("KWPS.Application");

//                Dispatch docs = wps.getProperty("Documents").toDispatch();
//                Dispatch d = Dispatch.call(docs, "Open", wordFile.getAbsolutePath(), false, true).toDispatch();
//                Dispatch.call(d, "SaveAs", pdfFile.getAbsolutePath(), 17);
//                Dispatch.call(d, "Close", false);

                doc = wps.invokeGetComponent("Documents")
                        .invokeGetComponent("Open", new Variant(wordFile.getAbsolutePath()));

                try {
                    doc.invoke("SaveAs",
                            new Variant(new File("C:UserslotucDocumentsmmm.pdf").getAbsolutePath()),
                            new Variant(17));
                    convertSuccessfully = true;
                } catch (Exception e) {
                    logger.warning("生成PDF失败");
                    e.printStackTrace();
                }

                File saveAsFile = new File("C:UserslotucDocumentssaveasfile.doc");
                try {
                    doc.invoke("SaveAs", saveAsFile.getAbsolutePath());
                    logger.info("成功另存为" + saveAsFile.getAbsolutePath());
                } catch (Exception e) {
                    logger.info("另存为" + saveAsFile.getAbsolutePath() + "失败");
                    e.printStackTrace();
                }
            } finally {
                if (doc == null) {
                    logger.info("打开文件 " + wordFile.getAbsolutePath() + " 失败");
                } else {
                    try {
                        logger.info("释放文件 " + wordFile.getAbsolutePath());
                        doc.invoke("Close");
                        doc.safeRelease();
                    } catch (Exception e1) {
                        logger.info("释放文件 " + wordFile.getAbsolutePath() + " 失败");
                    }
                }

                if (wps == null) {
                    logger.info("加载 WPS 控件失败");
                } else {
                    try {
                        logger.info("释放 WPS 控件");
                        wps.invoke("Quit");
                        wps.safeRelease();
                    } catch (Exception e1) {
                        logger.info("释放 WPS 控件失败");
                    }
                }
            }

            return convertSuccessfully;
        }
LiberOffice

Convert Microsoft Word to PDF - using Java and LibreOffice (UNO API)

LiberOffice本身提供了一个命令行工具进行转换,在你安装好了LiberOffice之后

/usr/local/bin/soffice --convert-to pdf:writer_pdf_Export /Users/lotuc/Downloads/test.doc

如果有打开的libreoffice实例, 要穿入env选项指定一个工作目录

/usr/local/bin/soffice "-env:UserInstallation=file:///tmp/LibreOffice_Conversion_abc" --convert-to pdf:writer_pdf_Export /Users/lotuc/Downloads/test.doc

首先我们需要安装好LiberOffice,然后将依赖的Jar包添加到classpath中:

Install Libre Office

Create a Java project in your favorite editor and add these to your class path:
  [Libre Office Dir]/URE/java/juh.jar
  [Libre Office Dir]/URE/java/jurt.jar
  [Libre Office Dir]/URE/java/ridl.jar
  [Libre Office Dir]/program/classes/unoil.jar

然后我们需要启动一个LiberOffice进程:

import java.util.Date;
import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.Bootstrap;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.text.XTextDocument;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;
import com.sun.star.util.XReplaceDescriptor;
import com.sun.star.util.XReplaceable;

public class MailMergeExample {

public static void main(String[] args) throws Exception {

 // Initialise
 XComponentContext xContext = Bootstrap.bootstrap();

 XMultiComponentFactory xMCF = xContext.getServiceManager();
 
 Object oDesktop = xMCF.createInstanceWithContext(
      "com.sun.star.frame.Desktop", xContext);
 
 XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(
      XDesktop.class, oDesktop);

接下来我们需要加载目标Doc文档:

// Load the Document
String workingDir = "C:/projects/";
String myTemplate = "letterTemplate.doc";

if (!new File(workingDir + myTemplate).canRead()) {
 throw new RuntimeException("Cannot load template:" + new File(workingDir + myTemplate));
}

XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
 .queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);

String sUrl = "file:///" + workingDir + myTemplate;

PropertyValue[] propertyValues = new PropertyValue[0];

propertyValues = new PropertyValue[1];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Hidden";
propertyValues[0].Value = new Boolean(true);

XComponent xComp = xCompLoader.loadComponentFromURL(
 sUrl, "_blank", 0, propertyValues);


然后我们可以使用如下方式对内容进行替换:

// Search and replace
XReplaceDescriptor xReplaceDescr = null;
XReplaceable xReplaceable = null;

XTextDocument xTextDocument = (XTextDocument) UnoRuntime
  .queryInterface(XTextDocument.class, xComp);

xReplaceable = (XReplaceable) UnoRuntime
  .queryInterface(XReplaceable.class, xTextDocument);

xReplaceDescr = (XReplaceDescriptor) xReplaceable
  .createReplaceDescriptor();

// mail merge the date
xReplaceDescr.setSearchString("");
xReplaceDescr.setReplaceString(new Date().toString());
xReplaceable.replaceAll(xReplaceDescr);

// mail merge the addressee
xReplaceDescr.setSearchString("");
xReplaceDescr.setReplaceString("Best Friend");
xReplaceable.replaceAll(xReplaceDescr);

// mail merge the signatory
xReplaceDescr.setSearchString("");
xReplaceDescr.setReplaceString("Your New Boss");
xReplaceable.replaceAll(xReplaceDescr);

然后可以输出到PDF中:

// save as a PDF
XStorable xStorable = (XStorable) UnoRuntime
  .queryInterface(XStorable.class, xComp);

propertyValues = new PropertyValue[2];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Overwrite";
propertyValues[0].Value = new Boolean(true);
propertyValues[1] = new PropertyValue();
propertyValues[1].Name = "FilterName";
propertyValues[1].Value = "writer_pdf_Export";

// Appending the favoured extension to the origin document name
String myResult = workingDir + "letterOutput.pdf";
xStorable.storeToURL("file:///" + myResult, propertyValues);

System.out.println("Saved " + myResult);

xdocreport

本文的核心代码如下,完整代码查看这里:

/**
 * @param inpuFile 输入的文件流
 * @param outFile  输出的文件对象
 * @return
 * @function 利用Apache POI从输入的文件中生成PDF文件
 */
@SneakyThrows
public static void convertWithPOI(InputStream inpuFile, File outFile) {

    //从输入的文件流创建对象
    XWPFDocument document = new XWPFDocument(inpuFile);

    //创建PDF选项
    PdfOptions pdfOptions = PdfOptions.create();//.fontEncoding("windows-1250")

    //为输出文件创建目录
    outFile.getParentFile().mkdirs();

    //执行PDF转化
    PdfConverter.getInstance().convert(document, new FileOutputStream(outFile), pdfOptions);

}

/**
 * @param inpuFile
 * @param outFile
 * @param renderParams
 * @function 先将渲染参数填入模板DOCX文件然后生成PDF
 */
@SneakyThrows
public static void convertFromTemplateWithFreemarker(InputStream inpuFile, File outFile, Map renderParams) {

    //创建Report实例
    IXDocReport report = XDocReportRegistry.getRegistry().loadReport(
            inpuFile, TemplateEngineKind.Freemarker);

    //创建上下文
    IContext context = report.createContext();

    //填入渲染参数
    renderParams.forEach((s, o) -> {
        context.put(s, o);
    });

    //创建输出流
    outFile.getParentFile().mkdirs();

    //创建转化参数
    Options options = Options.getTo(ConverterTypeTo.PDF).via(
            ConverterTypeVia.XWPF);

    //执行转化过程
    report.convert(context, options, new FileOutputStream(outFile));
}

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/65087.html

相关文章

  • python 实用程序 | PDF Word

    摘要:虽然现在市面上有很多转软件,比如,但大多数的软件是要收费的,并且价格不菲。于是乎我就想到了利用来写个程序,把转成文档。具体的程序逻辑,可以去查看原文。本文首发于公众号痴海,每天分享干货,后台回复,领取最新教程。 showImg(https://segmentfault.com/img/remote/1460000015686184); 阅读文本大概需要 6 分钟。 现在网上有很多文档是...

    sorra 评论0 收藏0
  • JS下载文件常用的方式

    摘要:下载附件,,,,,,应该是实际工作中经常遇到一个问题这里使用过几种方式分享出来仅供参考初次写可能存在问题有问题望指出主要了解的几个知识点响应头设置这里只需要涉及跨域的时才使用,用于暴露中能够获取到响应头字段先来介绍常用方式这里下载文 下载附件(image,doc,docx, excel,zip,pdf),应该是实际工作中经常遇到一个问题;这里使用过几种方式分享出来仅供参考; 初次写可能...

    alaege 评论0 收藏0
  • SpringBoot使用LibreOfficePDF

    摘要:用将文档转换本例使用。在和环境下测试通过。转换命令源文件放在或者封装了一组转换命令,通过调用相关服务。安装检查已有字体库复制字体新建文件夹把系统的字体复制进去。 用LibreOffice将Office文档转换PDF 本例使用 LibreOffice-6.0.4、jodconverter-4.2.0、spring-boot-1.5.9.RELEASE。 在CentOS7 + ope...

    mcterry 评论0 收藏0

发表评论

0条评论

zgbgx

|高级讲师

TA的文章

阅读更多
最新活动
阅读需要支付1元查看
<