如何使用Delphi读取文本文件中的最后一行

Ham*_*sby 2 delphi ascii pascal delphi-xe2

我需要阅读一些非常大的文本文件中的最后一行(从数据中获取时间戳).TStringlist是一个简单的方法,但它返回一个内存不足的错误.我正在尝试使用seek和blockread,但缓冲区中的字符都是无意义的.这与unicode有关吗?

    Function TForm1.ReadLastLine2(FileName: String): String;
    var
      FileHandle: File;
      s,line: string;
      ok: 0..1;
      Buf: array[1..8] of Char;
      k: longword;
      i,ReadCount: integer;
    begin
      AssignFile (FileHandle,FileName);
      Reset (FileHandle);           // or for binary files: Reset (FileHandle,1);
      ok := 0;
      k := FileSize (FileHandle);
      Seek (FileHandle, k-1);
      s := '';
      while ok<>1 do begin
        BlockRead (FileHandle, buf, SizeOf(Buf)-1, ReadCount);  //BlockRead ( var FileHandle : File; var Buffer; RecordCount : Integer {; var RecordsRead : Integer} ) ;
        if ord (buf[1]) <>13 then         //Arg to integer
          s := s + buf[1]
        else
          ok := ok + 1;
        k := k-1;
        seek (FileHandle,k);
      end;
      CloseFile (FileHandle);

      // Reverse the order in the line read
      setlength (line,length(s));
      for i:=1 to length(s) do
        line[length(s) - i+1 ] := s[i];
      Result := Line;
    end;
Run Code Online (Sandbox Code Playgroud)

基于www.delphipages.com/forum/showthread.php?t=102965

testfile是我在excel中创建的简单CSV(这不是我最终需要阅读的100MB).

    a,b,c,d,e,f,g,h,i,j,blank
    A,B,C,D,E,F,G,H,I,J,blank
    1,2,3,4,5,6,7,8,9,0,blank
    Mary,had,a,little,lamb,His,fleece,was,white,as,snow
    And,everywhere,that,Mary,went,The,lamb,was,sure,to,go
Run Code Online (Sandbox Code Playgroud)

Ari*_*The 5

你真的必须从尾部到头部以LARGE块的形式读取文件.因为它太大而不适合内存 - 然后从头到尾逐行读取它会非常慢.随着ReadLn- 两次慢.

您还必须准备好最后一行可能以EOL结束,或者可能不结束.

我个人也会考虑三种可能的EOL序列:

  • CR/LF又名#13#10 = ^ M ^ J - DOS/Windows风格
  • 没有LF的CR - 只是#13 = ^ M - 经典MacOS文件
  • 没有CR的LF - 只是#10 = ^ J - UNIX风格,包括MacOS版本10

如果您确定您的CSV文件只能由本机Windows程序生成,则可以安全地假设使用完整的CR/LF.但如果可以有其他Java程序,非Windows平台,移动程序 - 我会不太确定.当然,没有LF的纯CR将是最不可能的情况.

uses System.IOUtils, System.Math, System.Classes;

type FileChar = AnsiChar; FileString = AnsiString; // for non-Unicode files
// type FileChar = WideChar; FileString = UnicodeString;// for UTF16 and UCS-2 files
const FileCharSize = SizeOf(FileChar);
// somewhere later in the code add: Assert(FileCharSize = SizeOf(FileString[1]);

function ReadLastLine(const FileName: String): FileString; overload; forward;

const PageSize = 4*1024; 
// the minimal read atom of most modern HDD and the memory allocation atom of Win32
// since the chances your file would have lines longer than 4Kb are very small - I would not increase it to several atoms.

function ReadLastLine(const Lines: TStringDynArray): FileString; overload;
var i: integer;
begin
  Result := '';
  i := High(Lines);
  if i < Low(Lines) then exit; // empty array - empty file

  Result := Lines[i];
  if Result > '' then exit; // we got the line

  Dec(i); // skip the empty ghost line, in case last line was CRLF-terminated
  if i < Low(Lines) then exit; // that ghost was the only line in the empty file
  Result := Lines[i];
end;

// scan for EOLs in not-yet-scanned part
function FindLastLine(buffer: TArray<FileChar>; const OldRead : Integer; 
     const LastChunk: Boolean; out Line: FileString): boolean;
var i, tailCRLF: integer; c: FileChar;
begin
  Result := False;
  if Length(Buffer) = 0 then exit;

  i := High(Buffer);    
  tailCRLF := 0; // test for trailing CR/LF
  if Buffer[i] = ^J then begin // LF - single, or after CR
     Dec(i);
     Inc(tailCRLF);
  end;
  if (i >= Low(Buffer)) and (Buffer[i] = ^M) then begin // CR, alone or before LF
     Inc(tailCRLF);
  end;

  i := High(Buffer) - Max(OldRead, tailCRLF);
  if i - Low(Buffer) < 0 then exit; // no new data to read - results would be like before

  if OldRead > 0 then Inc(i); // the CR/LF pair could be sliced between new and previous buffer - so need to start a bit earlier

  for i := i downto Low(Buffer) do begin
      c := Buffer[i];
      if (c=^J) or (c=^M) then begin // found EOL
         SetString( Line, @Buffer[i+1], High(Buffer) - tailCRLF - i);
         exit(True); 
      end;
  end;  

  // we did not find non-terminating EOL in the buffer (except maybe trailing),
  // now we should ask for more file content, if there is still left any
  // or take the entire file (without trailing EOL if any)

  if LastChunk then begin
     SetString( Line, @Buffer[ Low(Buffer) ], Length(Buffer) - tailCRLF);
     Result := true;
  end;
end;


function ReadLastLine(const FileName: String): FileString; overload;
var Buffer, tmp: TArray<FileChar>; 
    // dynamic arrays - eases memory management and protect from stack corruption
    FS: TFileStream; FSize, NewPos: Int64; 
    OldRead, NewLen : Integer; EndOfFile: boolean;
begin
  Result := '';
  FS := TFile.OpenRead(FileName);
  try
    FSize := FS.Size;
    if FSize <= PageSize then begin // small file, we can be lazy!
       FreeAndNil(FS);  // free the handle and avoid double-free in finally
       Result := ReadLastLine( TFile.ReadAllLines( FileName, TEncoding.ANSI )); 
          // or TEncoding.UTF16
          // warning - TFIle is not share-aware, if the file is being written to by another app
       exit;
    end;

    SetLength( Buffer, PageSize div FileCharSize);
    OldRead := 0;
    repeat
      NewPos := FSize - Length(Buffer)*FileCharSize;
      EndOfFile := NewPos <= 0;
      if NewPos < 0 then NewPos := 0; 
      FS.Position := NewPos;

      FS.ReadBuffer( Buffer[Low(Buffer)], (Length(Buffer) - OldRead)*FileCharSize);

      if FindLastLine(Buffer, OldRead, EndOfFile, Result) then 
         exit; // done !

      tmp := Buffer; Buffer := nil; // flip-flop: preparing to broaden our mouth

      OldRead := Length(tmp); // need not to re-scan the tail again and again when expanding our scanning range
      NewLen := Min( 2*Length(tmp), FSize div FileCharSize );

      SetLength(Buffer, NewLen); // this may trigger EOutOfMemory...
      Move( tmp[Low(tmp)], Buffer[High(Buffer)-OldRead+1], OldRead*FileCharSize);
      tmp := nil; // free old buffer
    until EndOfFile;
  finally
    FS.Free;
  end;
end;
Run Code Online (Sandbox Code Playgroud)

PS.注意一个额外的特殊情况 - 如果你使用Unicode字符(两个字节的)并且会给出奇数长度的文件(3个字节,5个字节等) - 你将永远无法扫描起始的单个字节(半宽字节) ).也许你应该在那里添加额外的警卫,比如Assert( 0 = FS.Size mod FileCharSize)

PPS.根据经验,你最好将这些功能保留在表单类之外,因为为什么混合使用它们?通常,您应该将问题分成小块.读取文件与用户交互没有任何关系 - 因此最好将其卸载到额外的UNIT中.然后,您可以在主线程或多线程应用程序中以一种形式或10种形式使用该单元中的函数.像乐高零件一样 - 它们通过小而独立的方式为您提供灵活性.

购买力平价.这里的另一种方法是使用内存映射文件.谷歌为Delphi的MMF实施和关于MMF方法的好处和问题的文章.我个人认为重写上面的代码使用MMF会大大简化它,删除几个"特殊情况"和麻烦的内存复制触发器.OTOH它会要求你对指针算术非常严格.