lke*_*ler 9 delphi parsing pascal token
我有一个庞大的文件,我必须逐行解析.速度至关重要.
一行示例:
Run Code Online (Sandbox Code Playgroud)Token-1 Here-is-the-Next-Token Last-Token-on-Line ^ ^ Current Position Position after GetToken
调用GetToken,返回"Here-is-the-Next-Token"并将CurrentPosition设置为令牌最后一个字符的位置,以便为下次调用GetToken做好准备.令牌由一个或多个空格分隔.
假设文件已经在内存中的StringList中.它很容易适合内存,比如200 MB.
我只担心解析的执行时间.什么代码将在Delphi(Pascal)中产生绝对最快的执行?
Bar*_*lly 33
这是一个非常有效的样本词法分析器,但它假设所有源数据都在一个字符串中.由于令牌非常长,重写它以处理缓冲区是相当棘手的.
type
  TLexer = class
  private
    FData: string;
    FTokenStart: PChar;
    FCurrPos: PChar;
    function GetCurrentToken: string;
  public
    constructor Create(const AData: string);
    function GetNextToken: Boolean;
    property CurrentToken: string read GetCurrentToken;
  end;
{ TLexer }
constructor TLexer.Create(const AData: string);
begin
  FData := AData;
  FCurrPos := PChar(FData);
end;
function TLexer.GetCurrentToken: string;
begin
  SetString(Result, FTokenStart, FCurrPos - FTokenStart);
end;
function TLexer.GetNextToken: Boolean;
var
  cp: PChar;
begin
  cp := FCurrPos; // copy to local to permit register allocation
  // skip whitespace; this test could be converted to an unsigned int
  // subtraction and compare for only a single branch
  while (cp^ > #0) and (cp^ <= #32) do
    Inc(cp);
  // using null terminater for end of file
  Result := cp^ <> #0;
  if Result then
  begin
    FTokenStart := cp;
    Inc(cp);
    while cp^ > #32 do
      Inc(cp);
  end;
  FCurrPos := cp;
end;