如何加快Word Interop处理速度?

Chr*_*ore 5 c#

我对 C# 非常陌生,并且编写了相当笨重的代码。我在网上学习了很多课程,很多人都说解决问题的方法有多种。现在我制作了一个程序,它将加载 .Doc Word 文件,然后使用 if 语句搜索相关信息。

现在我的解决方案的问题是这个程序需要永远!!!我说的是 30 分钟 - 1 小时来完成以下代码。

有什么想法可以让我的小程序不那么笨重吗?我希望这个问题的解决方案能够大大增加我的知识,所以提前感谢大家!

问候克里斯

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace WindowsFormsApplication3
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
        public int id = 0;
        public int[] iD = new int[100];
        public string[] timeOn = new string[100];
        public string[] timeOff = new string[100];
        public string[] dutyNo = new string[100];
        public string[] day = new string[100];

        private void button1_Click(object sender, EventArgs e)
        {



            Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
            Microsoft.Office.Interop.Word.Document document = application.Documents.Open("c:\\Users\\Alien\\Desktop\\TESTJOBS.doc");
            //the following for will loop for all words

            int count = document.Words.Count;
            for (int i = 1; i <= count; i++)
            {
                // the following if statement will look for the first word that is On
                // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                if (document.Words[i].Text == "On")
                {
                    iD[id] = id;
                   // Console.WriteLine("ID Number ={0}", iD[id]);
                    dutyNo[id] = document.Words[i - 14].Text;
                   // Console.WriteLine("duty No set to:{0}", dutyNo[id]);
                    timeOn[id] = document.Words[i + 2].Text + document.Words[i + 3].Text + document.Words[i + 4].Text;
                   // Console.WriteLine("on time set to:{0}", timeOn[id]);
                    // the following if (runs if the last word was not "On" and then searches for the word "Off" which procedes "On" in the file format)
                    // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                }
                else if (document.Words[i].Text == "Off")
                {
                    timeOff[id] = document.Words[i + 2].Text + document.Words[i + 3].Text + document.Words[i + 4].Text;
                    //Console.WriteLine("off time set to:{0}", timeOff[id]);
                    // the following if (runs if the last word was not "Off" and then searches for the word "Duty" which procedes "Off" in the file format)
                    // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                }
                else if (document.Words[i].Text == "Days" && !(document.Words[i + 3].Text == "Type"))
                {

                    day[id] = document.Words[i + 2].Text;
                    //Console.WriteLine("day set to:{0}", day[id]);
                    //we then print the whole new duty out to ListBox1
                    listBox1.Items.Add(string.Format("new duty ID:{0} Time on:{1} Time off:{2} Duty No:{3} Day:{4}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id]));
                    id++;
                }


            }

            for (int i = 1; i <= 99; i++)
            {
                Console.WriteLine("new duty ID:{0} Time on:{1} Time off:{2} Duty No:{3} Day:{4}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id]);
            }


        }
    }
}
Run Code Online (Sandbox Code Playgroud)

Eug*_*kal 3

Office Interop相当慢

Openxml 可能更快但文件是 .doc,因此它可能无法处理它。


但就像这个问题中的 Excel 一样,有一种方法可以提高性能 -不要按Range索引访问每个单词,因为据我所知,它会导致创建一个Range包含在RCW中的单独实例,而这是性能瓶颈的主要候选者。在您的应用程序中。

这意味着提高性能的最佳选择是在实际处理之前将所有单词 ( )加载到某个可.Text索引集合中,然后才使用该集合来创建输出。String

如何以最快的方式做到这一点?_Document.Words 我不太确定,但您可以尝试从枚举器获取所有单词(尽管它可能会或可能不会更高效,但至少您将能够看到检索所需单词需要多长时间):

var words = document
    .Cast<Range>()
    .Select(r => 
        r.Text)
    .ToList();
Run Code Online (Sandbox Code Playgroud)

或者您可以尝试使用_Document.Contentrange Text,但您必须自己分隔各个单词。