Webscraper不起作用

Dam*_*cir 0 python regex web-scraping

我几乎遵循了一个教程,我希望我的刮刀刮掉包含每个警察局信息的特定页面的所有链接,但它几乎返回整个网站.

from urllib import urlopen
import re

f = urlopen("http://www.emergencyassistanceuk.co.uk/list-of-uk-police-stations.html").read()

b = re.compile('<span class="listlink-police"><a href="(.*)">')
a = re.findall(b, f)

listiterator = []
listiterator[:] = range(0,16)

for i in listiterator:
    print a 
    print "\n"

f.close()
Run Code Online (Sandbox Code Playgroud)

Kur*_*tal 7

使用BeautifulSoup

from bs4 import BeautifulSoup
from urllib2 import urlopen

f = urlopen("http://www.emergencyassistanceuk.co.uk/list-of-uk-police-stations.html").read()

bs = BeautifulSoup(f)

for tag in bs.find_all('span', {'class': 'listlink-police'}):
    print tag.a['href']
Run Code Online (Sandbox Code Playgroud)

  • "谢谢,做了我需要的." 最好表达["通过单击答案左侧的复选框轮廓"](http://stackoverflow.com/faq#howtoask). (3认同)