Ale*_*lex 1 vb.net textinput web
我正在使用Windows窗体应用程序.我有一个名为"tbPhoneNumber"的文本框,其中包含一个电话号码.
我想去的网站上http://canada411.com和那是在我的文本框的数量进入,进入的网站文本ID:"c411PeopleReverseWhat",然后莫名其妙的"查找"(这是一个属于输入发送点击上课"c411ButtonImg").
之后,我想检索以下HTML部分的星号之间的内容:
<div id="contact" class="vcard">
<span><h1 class="fn c411ListedName">**Full Name**</h1></span>
<span class="c411Phone">**(###)-###-####**</span>
<span class="c411Address">**Address**</span>
<span class="adr">
<span class="locality">**City**</span>
<span class="region">**Province**</span>
<span class="postal-code">**L#L#L#**</span>
</span>
Run Code Online (Sandbox Code Playgroud)
所以基本上我试图将数据发送到输入框,单击输入按钮并将检索到的值存储到变量中.我想要做到这一点,所以我需要做一些像HTTPWebRequest的事情?或者我使用WebBrowser对象?我只是不希望用户看到该应用程序在网站上.
我做了大量的网站搜索,我会告诉你我是如何做到的.如果我过于具体,请随意跳过,但这是一个常见的主题,应该具体说明.
我用它的库是htmlagilitypack(它是一个dll,创建一个新项目并添加对它的引用).要检查的第一件事是我们是否必须采取任何特殊步骤才能使用电话号码到达页面.我搜索了约翰史密斯并发现了不少.我输入了其中两个结果,并注意到网址格式非常简单.那些结果是......
http://www.canada411.ca/res/7056736767/John-Smith/138223109.html
http://www.canada411.ca/res/7052355273/John-Smith/172439951.html
我测试了是否可以从我不知道的网址中删除一些值,只留下电话号码.结果是我可以......
http://www.canada411.ca/search/re/1/7056736767/-
http://www.canada411.ca/search/re/1/7052355273/-
您可以通过网址看到网址和电话号码中有一些静态区域.从这里开始,为url构造一个字符串.
Dim phoneNumber as string = "7056736767" 'this could be TextBox1.Text or whatever
Dim URL as string = "http://www.canada411.ca/search/re/1/" + phoneNumber +"/-"
Run Code Online (Sandbox Code Playgroud)
现在我们已经拨打了页面,让我们检查一下上面提供的html.您需要从页面中获取6个值,因此我们现在将创建它们...
Dim FullName As String
Dim Phone As String
Dim Address As String
Dim Locality As String
Dim Region As String
Dim PostalCode As String
Run Code Online (Sandbox Code Playgroud)
如上所述,我们将使用使用Xpath的 htmlagilitypack .关于这一点很酷的是,一旦我们可以在html中找到一些唯一标识符,我们就可以使用Xpath来查找我们的值.我知道这可能令人困惑,但它会变得更加清晰.
您需要的所有值都在具有类名的标记内.让我们在Xpath中使用类名来查找它们.
Dim FullNameXPath As String = "//*[@class='fn c411ListedName']"
Dim PhoneXPath As String = "//*[@class='c411Phone']"
Dim AddressXPath As String = "//*[@class='c411Address']"
Dim LocalityXPath As String = "//*[@class='locality']"
Dim RegionXPath As String = "//*[@class='region']"
Dim PostalCodeXPath As String = "//*[@class='postal-code']"
Run Code Online (Sandbox Code Playgroud)
基本上我们正在看的是一个字符串,它将通知htmlagilitypack要查找的内容.在我们的例子中,我们命名的类中包含的文本.XPath有很多,可能需要一段时间来解释所有这些.但请注意......如果您使用Google Chrome并在页面上突出显示某个值,则可以右键单击inspect元素.在下面显示的代码中,您可以右键单击该值并复制到XPath!很有用.
现在,剩下的就是连接到页面并填充这些变量.
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load(URL)
For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
Msgbox(nameResult.InnerText)
Next
Run Code Online (Sandbox Code Playgroud)
在上面的例子中,我们创建了一个名为Web的HtmlWeb对象.这是我们项目的实际爬虫.然后我们定义一个HtmlDocument,它将包含我们转换和可搜索的页面源.所有这些都是在幕后完成的.然后,我们发送Web以获取页面源并将其分配给我们创建的Doc对象.Doc是可重用的,幸好要求我们只连接一次页面.
for循环在我们的Doc中查找与FullNameXPath匹配的任何节点,FullNameXPath先前被定义为用于查找名称的XPath值.找到Node后,会将其分配给nameResult变量,并在循环内调用消息框以显示节点的内部文本.
所以当我们把它们放在一起时
Dim phoneNumber As String = "7056736767" 'this could be TextBox1.Text or whatever
Dim URL As String = "http://www.canada411.ca/search/re/1/" + phoneNumber + "/-"
Dim FullName As String
Dim Phone As String
Dim Address As String
Dim Locality As String
Dim Region As String
Dim PostalCode As String
Dim FullNameXPath As String = "//*[@class='fn c411ListedName']"
Dim PhoneXPath As String = "//*[@class='c411Phone']"
Dim AddressXPath As String = "//*[@class='c411Address']"
Dim LocalityXPath As String = "//*[@class='locality']"
Dim RegionXPath As String = "//*[@class='region']"
Dim PostalCodeXPath As String = "//*[@class='postal-code']"
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load(URL)
For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
FullName = nameResult.InnerText
MsgBox(FullName)
Next
For Each PhoneResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PhoneXPath)
Phone = PhoneResult.InnerText
MsgBox(Phone)
Next
For Each ADDRResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(AddressXPath)
Address = ADDRResult.InnerText
MsgBox(Address)
Next
For Each LocalResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(LocalityXPath)
Locality = LocalResult.InnerText
MsgBox(Locality)
Next
For Each RegionResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(RegionXPath)
Region = RegionResult.InnerText
MsgBox(Region)
Next
For Each postalCodeResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PostalCodeXPath)
PostalCode = postalCodeResult.InnerText
MsgBox(PostalCode)
Next
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5528 次 |
| 最近记录: |