使用python进行Web Scraping数据？

Question

使用python进行Web Scraping数据？

use*_*092 5 html python beautifulsoup web-scraping

我刚开始使用Python学习网页抓取.但是,我已经遇到了一些问题.

我的目标是从fishbase.org网上废弃不同金枪鱼品种的名称(http://www.fishbase.org/ComNames/CommonNameSearchList.php?CommonName=salmon)

问题:我无法提取所有物种名称.

这是我到目前为止:

import urllib2
from bs4 import BeautifulSoup

fish_url = 'http://www.fishbase.org/ComNames/CommonNameSearchList.php?CommonName=Tuna'
page = urllib2.urlopen(fish_url)

soup = BeautifulSoup(html_doc)

spans = soup.find_all(

Run Code Online (Sandbox Code Playgroud)

从这里开始,我不知道如何提取物种名称.我曾想过使用正则表达式(即soup.find_all("a", text=re.compile("\d+\s+\d+"))捕获标签内的文本......

任何输入将受到高度赞赏!

Answer 1

jco*_*ado 2

看着网页，我不确定你到底想要提取什么信息。但是，请注意，您可以使用以下属性轻松获取标签中的文本text：

>>> from bs4 import BeautifulSoup
>>> html = '<a>some text</a>'
>>> soup = BeautifulSoup(html)
>>> [tag.text for tag in soup.find_all('a')]
[u'some text']

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，11 月前
查看次数：	2174 次
最近记录：	13 年，11 月前