我正在尝试从该网站抓取图片和新闻网址。我定义的标签是
root_tag=["div", {"class":"ngp_col ngp_col-bottom-gutter-2 ngp_col-md-6 ngp_col-lg-4"}]
image_tag=["div",{"class":"low-rez-image"},"url"]
news_url=["a",{"":""},"href"]
Run Code Online (Sandbox Code Playgroud)
和 url 是url,我用于抓取网站的代码是。
ua1 = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
ua2 = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit 537.36 (KHTML, like Gecko) Chrome'
headers = {'User-Agent': ua2,
'Accept': 'text/html,application/xhtml+xml,application/xml;' \
'q=0.9,image/webp,*/*;q=0.8'}
session = requests.Session()
response = session.get(url, headers=headers)
webContent = response.content
bs = BeautifulSoup(webContent, 'lxml')
all_tab_data = bs.findAll(root_tag[0], root_tag[1])
result=[]
for div in all_tab_data:
try:
news_url=None
news_url = div.find(news_tag[0], news_tag[1]).get(news_tag[2])
except Exception as e:
news_url= None
try: …Run Code Online (Sandbox Code Playgroud) def powerof(num):
return num**2
number = [1,2,3,4,5,6,7,8]
s = list(map( powerof , number))
print(s)
Run Code Online (Sandbox Code Playgroud)
错误:“列表”对象不可调用