将多个word文档合并为一个Open Xml

Jul*_*ary 20 c# merge docx openxml openxml-sdk

我有大约10个word文档,我使用open xml和其他东西生成.现在我想创建另一个word文档,我想逐个加入到这个新创建的文档中.我希望使用open xml,任何提示都会很明显.以下是我的代码:

 private void CreateSampleWordDocument()
    {
        //string sourceFile = Path.Combine("D:\\GeneralLetter.dot");
        //string destinationFile = Path.Combine("D:\\New.doc");
        string sourceFile = Path.Combine("D:\\GeneralWelcomeLetter.docx");
        string destinationFile = Path.Combine("D:\\New.docx");
        try
        {
            // Create a copy of the template file and open the copy
            //File.Copy(sourceFile, destinationFile, true);
            using (WordprocessingDocument document = WordprocessingDocument.Open(destinationFile, true))
            {
                // Change the document type to Document
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
                //Get the Main Part of the document
                MainDocumentPart mainPart = document.MainDocumentPart;
                mainPart.Document.Save();
            }
        }
        catch
        {
        }
    }
Run Code Online (Sandbox Code Playgroud)

更新(使用AltChunks):

using (WordprocessingDocument myDoc = WordprocessingDocument.Open("D:\\Test.docx", true))
        {
            string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2) ;
            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            using (FileStream fileStream = File.Open("D:\\Test1.docx", FileMode.Open))
                chunk.FeedData(fileStream);
            AltChunk altChunk = new AltChunk();
            altChunk.Id = altChunkId;
            mainPart.Document
                .Body
                .InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
            mainPart.Document.Save();
        } 
Run Code Online (Sandbox Code Playgroud)

当我使用多个文件时,为什么此代码会覆盖最后一个文件的内容? 更新2:

 using (WordprocessingDocument myDoc = WordprocessingDocument.Open("D:\\Test.docx", true))
        {

            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 3);
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            using (FileStream fileStream = File.Open("d:\\Test1.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
                mainPart.Document.Save();
            }
            using (FileStream fileStream = File.Open("d:\\Test2.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
            }
            using (FileStream fileStream = File.Open("d:\\Test3.docx", FileMode.Open))
            {
                chunk.FeedData(fileStream);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;
                mainPart.Document
                    .Body
                    .InsertAfter(altChunk, mainPart.Document.Body
                    .Elements<Paragraph>().Last());
            } 
        }
Run Code Online (Sandbox Code Playgroud)

此代码将Test2数据附加两次,代替Test1数据.意思是我得到:

Test
Test2
Test2
Run Code Online (Sandbox Code Playgroud)

代替 :

Test
Test1
Test2
Run Code Online (Sandbox Code Playgroud)

Chr*_*ris 20

仅使用openXML SDK,您可以使用AltChunkelement将多个文档合并为一个.

这个链接易于组装多个单词文档和这个如何使用altChunk进行文档组装提供了一些示例.

编辑1

根据您altchunk在更新的问题(更新#1)中使用的代码,这是我测试的VB.Net代码,对我来说就像一个魅力:

Using myDoc = DocumentFormat.OpenXml.Packaging.WordprocessingDocument.Open("D:\\Test.docx", True)
        Dim altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2)
        Dim mainPart = myDoc.MainDocumentPart
        Dim chunk = mainPart.AddAlternativeFormatImportPart(
            DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML, altChunkId)
        Using fileStream As IO.FileStream = IO.File.Open("D:\\Test1.docx", IO.FileMode.Open)
            chunk.FeedData(fileStream)
        End Using
        Dim altChunk = New DocumentFormat.OpenXml.Wordprocessing.AltChunk()
        altChunk.Id = altChunkId
        mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements(Of DocumentFormat.OpenXml.Wordprocessing.Paragraph).Last())
        mainPart.Document.Save()
End Using
Run Code Online (Sandbox Code Playgroud)

编辑2

第二个问题(更新#2)

此代码将Test2数据附加两次,代替Test1数据.

与...有关altchunkid.

对于要在主文档中合并的每个文档,您需要:

  1. 添加AlternativeFormatImportPartmainDocumentPartId 它必须是唯一的.此元素包含插入的数据
  2. 在主体中添加一个Altchunk元素,您在其中设置id引用前一个元素AlternativeFormatImportPart.

在您的代码中,您使用的是相同的Id AltChunks.这就是为什么你看到很多时间相同的文字.

我不确定altchunkid对您的代码是唯一的: string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2);

如果您不需要设置特定值,我建议您不要AltChunkId在添加时明确设置AlternativeFormatImportPart.而是由SDK生成一个这样的:

VB.Net

Dim chunk As AlternativeFormatImportPart = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML)
Dim altchunkid As String = mainPart.GetIdOfPart(chunk)
Run Code Online (Sandbox Code Playgroud)

C#

AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML);
string altchunkid = mainPart.GetIdOfPart(chunk);
Run Code Online (Sandbox Code Playgroud)


Flo*_*ing 9

有一个很好的包装API(Document Builder 2.2)围绕open xml专门设计用于合并文档,灵活选择要合并的段落等.你可以从这里下载(更新:移动到github).

文档和屏幕蒙上如何使用它都在这里.

更新:代码示例

 var sources = new List<Source>();
 //Document Streams (File Streams) of the documents to be merged.
 foreach (var stream in documentstreams)
 {
        var tempms = new MemoryStream();
        stream.CopyTo(tempms);
        sources.Add(new Source(new WmlDocument(stream.Length.ToString(), tempms), true));
 }

  var mergedDoc = DocumentBuilder.BuildDocument(sources);
  mergedDoc.SaveAs(@"C:\TargetFilePath");
Run Code Online (Sandbox Code Playgroud)

类型SourceWmlDocument来自Document Builder API.

如果您选择以下情况,甚至可以直接添加文件路径:

sources.Add(new Source(new WmlDocument(@"C:\FileToBeMerged1.docx"));
sources.Add(new Source(new WmlDocument(@"C:\FileToBeMerged2.docx"));
Run Code Online (Sandbox Code Playgroud)

找到合并文档之间的良好比较AltChunkDocument Builder方法 - 有助于根据需求进行选择.

您也可以使用DocX库来合并文档,但我更喜欢使用Document Builder来合并文档.

希望这可以帮助.