标签: python-beautifultable

Beautiful Soup 找不到我想要的 HTML 部分

我使用 BeautifulSoup 进行网页抓取已经有一段时间了，这是我第一次遇到这样的问题。我试图在代码中选择数字 101,172，但即使我使用 .find 或 .select，输出也始终只是标签，而不是数字。我之前处理过类似的数据收集工作，没有遇到任何问题

<div class="legend-block legend-block--pageviews">
      <h5>Pageviews</h5><hr>
      <div class="legend-block--body">
        <div class="linear-legend--counts">
          Pageviews:
          <span class="pull-right">
            101,172
          </span>
        </div>
        <div class="linear-legend--counts">
          Daily average:
          <span class="pull-right">
            4,818
          </span>
        </div></div></div>

Run Code Online (Sandbox Code Playgroud)

我用了：

res = requests.get(wiki_page, timeout =None)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
ab=soup.select('span[class="pull-right"]')
#print(i)
print(ab)

Run Code Online (Sandbox Code Playgroud)

输出：

[<span class="pull-right">\n<label class="logarithmic-scale">\n<input 
class="logarithmic-scale-option" type="checkbox"/>\n        Logarithmic scale      
</label>\n</span>, <span class="pull-right">\n<label class="begin-at- 
zero">\n<input class="begin-at-zero-option" type="checkbox"/>\n        Begin at 
zero      </label>\n</span>, <span class="pull-right">\n<label class="show- 
labels">\n<input class="show-labels-option" type="checkbox"/>\n        Show 
values      </label>\n</span>]

Run Code Online (Sandbox Code Playgroud)

此外，我正在寻找的数据编号是动态的，所以我不确定 Javascript 是否会影响 BeautifulSoup

html python beautifulsoup web-scraping python-beautifultable

作者

2018 08-23

5
推荐指数

1
解决办法

2144
查看次数

使用 Python 抓取谷歌搜索结果标题和网址

我正在使用 Python(3.7) 进行一个项目，在该项目中我需要抓取标题和网址的前几个 Google 结果，我已经尝试使用 BeautifulSoup 但它不起作用：

这是我尝试过的：

import requests
from my_fake_useragent import UserAgent
from bs4 import BeautifulSoup

ua = UserAgent()

google_url = "https://www.google.com/search?q=python" + "&num=" + str(5)
response = requests.get(google_url, {"User-Agent": ua.random})
soup = BeautifulSoup(response.text, "html.parser")

result_div = soup.find_all('div', attrs={'class': 'g'})

links = []
titles = []
descriptions = []
for r in result_div:
    # Checks if each element is present, else, raise exception
    try:
        link = r.find('a', href=True)
        title = r.find('h3', attrs={'class': 'r'}).get_text()
        description = r.find('span', attrs={'class': 'st'}).get_text() …

Run Code Online (Sandbox Code Playgroud)

html python beautifulsoup web-scraping python-beautifultable

Abd*_*man

2019 05-31

3
推荐指数

1
解决办法

4474
查看次数