jia*_*ren 7 html python string
目标网页:http: //www.immi.gov.au/skilled/general-skilled-migration/estimated-allocation-times.htm
我要提取的部分:
<tr>
<td>Skilled – Independent (Residence) subclass 885<br />online</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>15 May 2011</td>
<td>N/A</td>
</tr>
Run Code Online (Sandbox Code Playgroud)
一旦代码通过搜索关键字" 子类885
在线 " 找到此部分,它应该打印第5个标签内的日期,即" 2011年5月15日 ",如上所示.
它只是一个监视我自己的监视我的移民申请的进展.
您可能希望将此作为起点:
Python 2.6.7 (r267:88850, Jun 13 2011, 22:03:32)
[GCC 4.6.1 20110608 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2, re
>>> from BeautifulSoup import BeautifulSoup
>>> urllib2.urlopen('http://www.immi.gov.au/skilled/general-skilled-migration/estimated-allocation-times.htm')
<addinfourl at 139158380 whose fp = <socket._fileobject object at 0x84aa2ac>>
>>> html = _.read()
>>> soup = BeautifulSoup(html)
>>> soup.find(text = re.compile('\\bsubclass 885\\b')).parent.parent.find('td', text = re.compile(' [0-9]{4}$'))
u'15 May 2011'
Run Code Online (Sandbox Code Playgroud)
e - e - 晚上的soo - oop,
- 刘易斯卡罗尔,爱丽丝梦游仙境
我认为这正是他的想法!
模拟海龟可能会做这样的事情:
>>> from BeautifulSoup import BeautifulSoup
>>> import urllib2
>>> url = 'http://www.immi.gov.au/skilled/general-skilled-migration/estimated-allocation-times.htm'
>>> page = urllib2.urlopen(url)
>>> soup = BeautifulSoup(page)
>>> for row in soup.html.body.findAll('tr'):
... data = row.findAll('td')
... if data and 'subclass 885online' in data[0].text:
... print data[4].text
...
15 May 2011
Run Code Online (Sandbox Code Playgroud)
但我不确定它会有所帮助,因为那个日期已经过去了!
祝你好运!
| 归档时间: |
|
| 查看次数: |
7941 次 |
| 最近记录: |