通过HSSF.EventUserModel读取带有受保护书籍和表格的XLS

Ste*_*ven 5 c# java excel apache-poi npoi

结束目标:有效地(一次通过)读取所有CellRecords巨大的(30,000多行),受保护Worksheet.

问题:HSSF.EventUserModel如何 使用,如何读取具有工作簿和工作表保护的XLS文件的所有内容Record(包括CellRecords)?

创建输入电子表格(在Excel 2010中):

  1. 创建新的空白工作簿.
  2. 将A1的值设置为数字:50
  3. 将A2的值设置为字符串:50
  4. 将A3的值设置为公式:= 25*2
  5. 查看(功能区) - >保护表 - >密码:pass1
  6. 查看(功能区) - >保护工作簿 - >密码:pass1
  7. 文件(功能区) - >另存为... - >另存为类型:Excel 97-2003工作簿

迄今取得的进展:

  • XLS文件在Excel中没有密码打开.因此,您不需要密码才能在POI中打开它.
  • XLS文件成功打开new HSSFWorkbook(Stream fs).但是,我需要EventUserModel实际电子表格的效率.
  • 设置NPOI.HSSF.Record.Crypto.Biff8EncryptionKey.CurrentUserPassword = "pass1";不起作用.
  • ProcessRecord( )函数捕获了一个PasswordRecord,但我找不到任何关于如何正确处理它的文档.
  • 也许,这些EncryptionInfoDecryptor类可能有一些用途.

注意:
我正在使用NPOI.但是,我可以将任何Java示例翻译为C#.

代码:
我使用以下代码捕获Record事件.我Book1-unprotected.xls(没有保护)显示所有Record事件(包括单元格值).我Book1-protected.xls显示一些记录并抛出异常.

我只是processedEvents在调试器中查看.

using System;
using System.Collections.Generic;
using System.IO;

using NPOI.HSSF.Record;
using NPOI.HSSF.Model;
using NPOI.HSSF.UserModel;
using NPOI.HSSF.EventUserModel;
using NPOI.POIFS;
using NPOI.POIFS.FileSystem;

namespace NPOI_small {
    class myListener : IHSSFListener {
        List<Record> processedRecords;

        private Stream fs;

        public myListener(Stream fs) {
            processedRecords = new List<Record>();
            this.fs = fs;

            HSSFEventFactory factory = new HSSFEventFactory();
            HSSFRequest request = new HSSFRequest();

            MissingRecordAwareHSSFListener mraListener;
            FormatTrackingHSSFListener fmtListener;
            EventWorkbookBuilder.SheetRecordCollectingListener recListener;
            mraListener = new MissingRecordAwareHSSFListener(this);
            fmtListener = new FormatTrackingHSSFListener(mraListener);
            recListener = new EventWorkbookBuilder.SheetRecordCollectingListener(fmtListener);
            request.AddListenerForAllRecords(recListener);

            POIFSFileSystem poifs = new POIFSFileSystem(this.fs);

            factory.ProcessWorkbookEvents(request, poifs);
        }

        public void ProcessRecord(Record record) {
            processedRecords.Add(record);
        }
    }
    class Program {
        static void Main(string[] args) {
            Stream fs = File.OpenRead(@"c:\users\me\desktop\xx\Book1-protected.xls");

            myListener testListener = new myListener(fs); // Use EventModel 
            //HSSFWorkbook book = new HSSFWorkbook(fs); // Use UserModel

            Console.Read();
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

更新(胡安梅拉多): 以下是例外.我现在最好的猜测(在Victor Petrykin的回答中)是那些本身无法解密受保护记录的HSSFEventFactory用途RecordInputStream.收到例外后,processedRecords包含22条记录,包括以下可能重要的记录:

  • processedRecords [5]是一个WriteAccessRecord乱码(可能是加密的)值.name
  • processedRecords [22]是一个RefreshAllRecord并且是Record列表中的最后一个

例外:

NPOI.Util.RecordFormatException was unhandled
  HResult=-2146233088
  Message=Unable to construct record instance
  Source=NPOI
  StackTrace:
       at NPOI.HSSF.Record.RecordFactory.ReflectionConstructorRecordCreator.Create(RecordInputStream in1)
       at NPOI.HSSF.Record.RecordFactory.CreateSingleRecord(RecordInputStream in1)
       at NPOI.HSSF.Record.RecordFactory.CreateRecord(RecordInputStream in1)
       at NPOI.HSSF.EventUserModel.HSSFRecordStream.GetNextRecord()
       at NPOI.HSSF.EventUserModel.HSSFRecordStream.NextRecord()
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.GenericProcessEvents(HSSFRequest req, RecordInputStream in1)
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessEvents(HSSFRequest req, Stream in1)
       at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessWorkbookEvents(HSSFRequest req, POIFSFileSystem fs)
       at NPOI_small.myListener..ctor(Stream fs) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 35
       at NPOI_small.Program.Main(String[] args) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 80
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
  InnerException: NPOI.Util.RecordFormatException
       HResult=-2146233088
       Message=Expected to find a ContinueRecord in order to read remaining 137 of 144 chars
       Source=NPOI
       StackTrace:
            at NPOI.HSSF.Record.RecordInputStream.ReadStringCommon(Int32 requestedLength, Boolean pIsCompressedEncoding)
            at NPOI.HSSF.Record.RecordInputStream.ReadUnicodeLEString(Int32 requestedLength)
            at NPOI.HSSF.Record.FontRecord..ctor(RecordInputStream in1)
Run Code Online (Sandbox Code Playgroud)

小智 4

我认为这是NPOI库代码中的错误。据我了解,他们使用了不正确的流类型HSSFEventFactory:它使用RecordInputStream而不是RecordFactoryInputStream解密函数,如原始POI库或(这就是工作的UserModel原因)HSSFWorkbook

此代码也可以工作,但它不是事件逻辑:

POIFSFileSystem poifs = new POIFSFileSystem(fs);
Entry document = poifs.Root.GetEntry("Workbook");
DocumentInputStream docStream = new DocumentInputStream((DocumentEntry)document);
//RecordFactory factory = new RecordFactory();
//List<Record> records = RecordFactory.CreateRecords(docStream);
RecordFactoryInputStream recFacStream = new RecordFactoryInputStream(docStream, true);
Record currRecord;
while ((currRecord = recFacStream.NextRecord()) != null) 
   ProcessRecord(currRecord);
Run Code Online (Sandbox Code Playgroud)