打印某些HTML Python Mechanize

Question

打印某些HTML Python Mechanize

我正在制作一个小的python脚本,用于自动登录网站.但我被卡住了.

我正在寻找打印到终端的一小部分html,位于网站上html文件中的这个标签内:

<td class=h3 align='right'>&nbsp;&nbsp;John Appleseed</td><td>&nbsp;<a href="members_myaccount.php"><img border=0 src="../tbs_v7_0/images/myaccount.gif" alt="My Account"></a></td>

Run Code Online (Sandbox Code Playgroud)

但是,如何提取和打印名称,John Appleseed？

顺便说一句,我在Mac上使用Pythons的Mechanize.

Answer 1

Rab*_*ski 7

Mechanize仅适用于获取html.一旦你想从html中提取信息,你可以使用例如BeautifulSoup.(另请参阅我对类似问题的回答:Web挖掘或抓取或爬行？我应该使用什么工具/库？)

根据<td>html中的位置(您的问题不清楚),您可以使用以下代码:

html = ... # this is the html you've fetched

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html)
# use this (gets all <td> elements)
cols = soup.findAll('td')
# or this (gets only <td> elements with class='h3')
cols = soup.findAll('td', attrs={"class" : 'h3'})
print cols[0].renderContents() # print content of first <td> element

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，3 月前
查看次数：	6290 次
最近记录：	14 年，3 月前