735*_*sla 5 html python beautifulsoup html-parsing python-2.7
我正在使用BeautifulSoup解析html。到目前为止,我有以下代码:
url = "http://routerpasswords.com"
data = {"findpass":"1", "router":"Belkin", "findpassword":"Find Password"}
post_data = urllib.urlencode(data)
req = urllib2.urlopen(url, post_data)
html_str = req.read()
parser = new BeautifulSoup(html_str)
table = parser.find("table")
Run Code Online (Sandbox Code Playgroud)
有没有一种方法可以得到所有细胞的清单column
?这是一个示例:如果我有此表:
<table cellpadding="0" cellspacing="0" width="100%">
<thead>
<tr>
<th>Manufacturer</th>
<th>Model</th>
<th width="80">Protocol</th>
<th width="80">Username</th>
<th width="80">Password</th>
</tr>
</thead>
<tbody>
<tr>
<td><b>BELKIN</b></td>
<td>F5D6130</td>
<td>SNMP</td>
<td>(none)</td>
<td>MiniAP</td>
</tr>
<tr>
<td><b>BELKIN</b></td>
<td>F5D7150<i> Rev. FB</i></td>
<td>MULTI</td>
<td>n/a</td>
<td>admin</td>
</tr>
<tr>
<td><b>BELKIN</b></td>
<td>F5D8233-4</td>
<td>HTTP</td>
<td>(blank)</td>
<td>(blank)</td>
</tr>
<tr>
<td><b>BELKIN</b></td>
<td>F5D7231</td>
<td>HTTP</td>
<td>admin</td>
<td>(blank)</td>
</tr>
</tbody>
</table>
Run Code Online (Sandbox Code Playgroud)
如何获得该列中所有项目的Username
列表?我希望它们也可以是字符串。
小智 4
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(open("file.html",'r').read())
cols = [header.string for header in soup.find('thead').findAll('th')]
col_idx = cols.index('Username')
col_values = [td[col_idx].string
for td in [tr.findAll('td')
for tr in soup.find('tbody').findAll('tr')]]
print(col_values)
Run Code Online (Sandbox Code Playgroud)
结果是:
[u'(无)'、u'n/a'、u'(空白)'、u'admin']
归档时间: |
|
查看次数: |
2213 次 |
最近记录: |