如何获取HTML页面的"文本"？(Webbrowser - Delphi)

Question

如何获取HTML页面的"文本"？(Webbrowser - Delphi)

我正在使用WebBrowser来获取html页面的来源.我们的页面源包含一些文本和一些html标记.像这样 :

FONT&gt;&lt;/P&gt;&lt;P align=center&gt;&lt;FONT color=#ccffcc size=3&gt;**Hello There , This is a text in our html page** &lt;/FONT&gt;&lt;/P&gt;&lt;P align=center&gt; &lt;/P&gt;

Run Code Online (Sandbox Code Playgroud)

Html标签是随机的,我们无法猜测它们.那么有没有办法只获取文本并将它们与html标签分开？

Answer 1

RRU*_*RUZ 8

您可以使用TWebBrowser实例来解析并从html代码中选择plaint文本.

看这个样本

uses
MSHTML,
SHDocVw,
ActiveX;

function GetPlainText(Const Html: string): string;
var
DummyWebBrowser: TWebBrowser;
Document       : IHtmlDocument2;
DummyVar       : Variant;
begin
   Result := '';
   DummyWebBrowser := TWebBrowser.Create(nil);
   try
     //open an blank page to create a IHtmlDocument2 instance
     DummyWebBrowser.Navigate('about:blank');
     Document := DummyWebBrowser.Document as IHtmlDocument2; 
     if (Assigned(Document)) then //Check the Document
     begin
       DummyVar      := VarArrayCreate([0, 0], varVariant); //Create a variant array to write the html code to the  IHtmlDocument2
       DummyVar[0]   := Html; //assign the html code to the variant array
       Document.Write(PSafeArray(TVarData(DummyVar).VArray)); //set the html in the document
       Document.Close;
       Result :=(Document.body as IHTMLBodyElement).createTextRange.text;//get the plain text
     end;
   finally
     DummyWebBrowser.Free;
   end;
end;

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，9 月前
查看次数：	15797 次
最近记录：	11 年，2 月前