如何在网页抓取时获取数值数据？

Question

如何在网页抓取时获取数值数据？

我对网络抓取完全不熟悉,所以任何参考网站都会很棒.我对如何获取实际数据感到有些困惑.当我打印(theText)时,我得到了一堆html代码(应该是正确的).我如何从中获取价值呢？我是否必须使用正则表达式来获取实际的数值数据？

def getData():
    request = urllib.request.Request("http://www.weather.com/weather/5day/l/USGA0028:1:US")
    response = urllib.request.urlopen(request)
    the_page = response.read()
    theText = the_page.decode()
    print(theText)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Law*_*son 5

看看BeautifulSoup.它允许您通过ID或标签获取元素.它对基本抓取非常有用.
您可以使用响应文本(html页面)调用美味的汤,然后您可以调用bs方法

这应该有助于python https://docs.python.org/2/library/re.html这对于常规表达式一般http://regexone.com/ (2认同)

归档时间：	10 年，6 月前
查看次数：	227 次
最近记录：	10 年，6 月前