小编Fai*_*ony的帖子

Python获取点击值

我正在使用 Python 和 BeautifulSoup 为我的一个小项目抓取网页。该网页有多个条目，每个条目由 HTML 中的表格行分隔。我的代码部分工作但是很多输出是空白的，它不会从网页中获取所有结果，甚至不会将它们收集到同一行中。

<html>
<head>
<title>Sample Website</title>
</head>
<body>

<table>
<td class=channel>Artist</td><td class=channel>Title</td><td class=channel>Date</td><td class=channel>Time</td></tr>
<tr><td>35</td><td>Lorem Ipsum</td><td><a href="#" onClick="searchDB('LoremIpsum','FooWorld')">FooWorld</a></td><td>12/10/2014</td><td>2:53:17 PM</td></tr>
</table>
</body>
</html>

Run Code Online (Sandbox Code Playgroud)

我只想从 onclick 操作“searchDB”中提取值，例如“LoremIpsum”和“FooWorld”是我唯一想要的两个结果。

这是我写的代码。到目前为止，它正确地提取了一些写入值，但有时这些值是空的。

response = urllib2.urlopen(url)

html = response.read()

soup = bs4.BeautifulSoup(html)

properties = soup.findAll('a', onclick=True)

for eachproperty in properties:
    print re.findall("'([a-zA-Z0-9]*)'", eachproperty['onclick'])

Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么？

python beautifulsoup web-scraping

Fai*_*ony

2015 09-18

1
推荐指数

1
解决办法

1万
查看次数