HTML敏捷性解析

Rhy*_*hys 5 c# xml linq html-agility-pack

我想在绑定列表框中使用XML到LINQ解析HTML表和disaply内容.

我正在使用HTML Agility包并使用此代码.

    HtmlWeb web = new HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL");
    HtmlNode rateNode = doc.DocumentNode.SelectSingleNode("//div[@id='FlightInfo_FlightInfoUpdatePanel']");
    string rate = rateNode.InnerText;
    this.richTextBox1.Text = rate;
Run Code Online (Sandbox Code Playgroud)

HTML看起来像这样..

<div id="FlightInfo_FlightInfoUpdatePanel">

   <table cellspacing="0" cellpadding="0"><tbody>
     <tr class="">
     <td class="airline"><img src="/images/airline logos/NZ.gif" title="AIR NEW ZEALAND LIMITED. " alt="AIR NEW ZEALAND LIMITED. " /></td>
     <td class="flight">NZ8</td>
     <td class="codeshare">&nbsp;</td>
     <td class="origin">San Francisco</td>
     <td class="date">01 Sep</td>
     <td class="time">17:15</td>
     <td class="est">18:00</td>
     <td class="status">DEPARTED</td>
     </tr>
Run Code Online (Sandbox Code Playgroud)

但它正在归还这一点

NZ8&nbsp;San Francisco01 Sep17:1518:00DEPARTEDAC6103NZ8San Francisco01 Sep17:1518:00DEPARTEDCO6754NZ8San Francisco01 Sep17:1518:00DEPARTEDLH7157NZ8San Francisco01 Sep17:1518:00DEPARTEDUA6754NZ8San Francisco01 Sep17:1518:00DEPARTEDUS5308NZ8San Francisco01 Sep17:1518:00DEPARTEDVS7408NZ8San Francisco01 Sep17:1518:00DEPARTEDEK407&nbsp;Melbourne/Dubai01 Sep17:5017:50DEPARTEDEK413&nbsp;Sydney/Dubai01 Sep18:0018:00DEPARTEDQF44&nbsp;Sydney01 
Run Code Online (Sandbox Code Playgroud)

我想要的是将其解析为XML格式,然后使用LINQ to XML将XML解析为绑定的列表框项源.

我想我需要为每个班级使用下面的变体,但是想要一些帮助.

HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='flight']");
Run Code Online (Sandbox Code Playgroud)

Ode*_*ded 5

您正在使用InnerText哪个去除HTML.

用途InnerHtml:

string rate = rateNode.InnerHtml;
Run Code Online (Sandbox Code Playgroud)

您可以从此字符串创建XML文档(假设它是有效的XML).

您也可以rateNode像检索它一样查询它 - 选择它的子节点:

var firstRow = rateNode.SelectSingleNode("./table/tbody/tr[0]");
string origin = firstRow.SelectSingleNode("./td[@class = 'origin']");
Run Code Online (Sandbox Code Playgroud)


Ale*_*tin 5

如果要使用linq to xml,可以将HtmlDocument转换为xml字符串:

HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://www.SourceURL");  
doc.OptionOutputAsXml = true;
System.IO.StringWriter sw = new System.IO.StringWriter();
System.Xml.XmlTextWriter xw = new System.Xml.XmlTextWriter(sw);
doc.Save(xw);
string result = sw.ToString();
Run Code Online (Sandbox Code Playgroud)

然后,您只需要创建一个XDocument对象并使用xml字符串加载:

System.Xml.Linq.XDocument xDoc = System.Xml.Linq.XDocument.Parse(result);
Run Code Online (Sandbox Code Playgroud)

现在你有了一个与Linq一起玩的XDocument.