小编Kam*_*ish的帖子

现在尝试使用 BeautifulSoup 和 Python 3 从类中提取“href”。

我似乎无法让它发挥作用。我有我的脚本转到一个站点并将数据抓取到我的变量中,但是当我尝试从我得到的特定类中info提取数据时,或者当我尝试各种不同的组合时它不起作用。我哪里搞砸了?当我将它刮到我的信息变量中时,它的内部有一个和。hrefNoneclass='business-name'href

import requests
from bs4 import BeautifulSoup

count = 0
search_terms = "Bars"
location = "New Orleans, LA"
url = "https://www.yellowpages.com/search"
q = {'search_terms': search_terms, 'geo_location_terms': location}
page = requests.get(url, params=q)
url_link = page.url
page_num = str(count)
searched_page = url_link + '&page=' + str(count)
page = requests.get(searched_page)
soup = BeautifulSoup(page.text, 'html.parser')
info = soup.findAll('div', {'class': 'info'})
for each_business in info:
    # This is the spot that is broken. I can't make it work! 
    yp_bus_url …
Run Code Online (Sandbox Code Playgroud)

python beautifulsoup

5
推荐指数
1
解决办法
4880
查看次数

如果类存在,Beautifulsoup

有没有办法让 BeautifulSoup 寻找一个类,如果它存在然后运行脚本?我正在尝试这个:

if soup.find_all("div", {"class": "info"}) == True:
    print("Tag Found")
Run Code Online (Sandbox Code Playgroud)

我也试过,但它没有用,并给出了一个关于属性太多的错误:

if soup.has_attr("div", {"class": "info"})
    print("Tag Found")
Run Code Online (Sandbox Code Playgroud)

python if-statement beautifulsoup

3
推荐指数
1
解决办法
9465
查看次数

Python 3.5 urllib.request 403禁止错误

import urllib.request
import urllib
from bs4 import BeautifulSoup


url = "https://www.brightscope.com/ratings"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "html.parser")

print(soup.title)
Run Code Online (Sandbox Code Playgroud)

我试图去上述网站,代码不断吐出403禁止错误。

有任何想法吗?

C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ python.exe“ C:/ Users / jerem / PycharmProjects / webscraper / url scraper.py”追溯(最近一次调用):文件“ C :/ Users / jerem / PycharmProjects / webscraper / url scraper.py”,第7行,页面= urllib.request.urlopen(url)文件“ C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ …

urllib beautifulsoup http-status-code-403 python-3.x

2
推荐指数
1
解决办法
7062
查看次数