标签: beautifulsoup

使用python / BeautifulSoup将HTML标签对替换为另一对

我需要用另一个标签替换一对匹配的HTML标签。BeautifulSoup（4）可能适合该任务，但是我以前从未使用过它，并且在任何地方都找不到合适的示例，有人可以给我提示吗？

例如，此HTML代码：

<font color="red">this text is red</font>

Run Code Online (Sandbox Code Playgroud)

应更改为：

<span style="color: red;">this text is red</span>

Run Code Online (Sandbox Code Playgroud)

开头和结尾的HTML标记可能不在同一行。

html python tags replace beautifulsoup

use*_*609

lucky-day

1
推荐指数

1
解决办法

3411
查看次数

使用Beautiful Soup 4一次搜索多种标签

我试图用来find_all()获取几种标签类型的所有实例(我不关心类),为了使用bs4的一点汤.

我想做这样的事情:

soup.find_all('p','a','span','b')

Run Code Online (Sandbox Code Playgroud)

在这种情况下,如果我有两个p标签然后是b标签,我希望命令按顺序返回这三个标签,尽管事实上没有标签a或span标签.这可能吗？

html python parsing beautifulsoup

作者

2013 07-09

1
推荐指数

1
解决办法

284
查看次数

为什么BeautifulSoup没有找到特定的表类？

我正在使用Beautiful Soup尝试从Oil-Price.net上刮下商品表.我可以找到第一个div,table,table body和table body的行.但是其中一行中有一列我用美丽的汤找不到.当我告诉python打印该特定行中的所有表时,它没有显示我想要的那个.这是我的代码:

from urllib2 import urlopen
from bs4 import BeautifulSoup

html = urlopen('http://oil-price.net').read()
soup = BeautifulSoup(html)

div = soup.find("div",{"id":"cntPos"})
table1 = div.find("table",{"class":"cntTb"})
tb1_body = table1.find("tbody")
tb1_rows = tb1_body.find_all("tr")
tb1_row = tb1_rows[1]
td = tb1_row.find("td",{"class":"cntBoxGreyLnk"})
print td

Run Code Online (Sandbox Code Playgroud)

所有打印都是无.我甚至尝试打印每一行,看看我是否可以手动找到列而不是任何内容."它会向别人展示.但不是我想要的那个.

python beautifulsoup web-scraping

use*_*023

2017 02-18

1
推荐指数

2
解决办法

1万
查看次数

当我使用pip install beautifulsoup时,为什么只有egg-info没有实际的模块？

我使用命令pip install beautifulsoup4安装beautifulsoup,然而,在我尝试导入它后失败我发现了一些有趣的东西,只有egg-info文件夹但没有脚本文件夹,有人可以告诉我为什么以及如何解决这个问题？我知道我可以获取脚本并将其移动到sitepackages文件夹,我确实喜欢它并且它可以工作,但我觉得这是个坏主意.

python egg pip beautifulsoup

lit*_*hen

2016 02-11

1
推荐指数

1
解决办法

2445
查看次数

使用Python和BeautifulSoup访问网页中标签的title属性

我是Python的新手,我正在尝试从特定网址中检索所有标题,但我无法这样做.代码编译没有任何错误,但仍然没有得到输出.

import requests
import sys
from bs4 import BeautifulSoup

def test_function(num):
    url = "https://www.zomato.com/chennai/restaurants?buffet=1&page=" +       
    str(num)
    source_code = requests.get(url) 
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text)
    for link in soup.findAll('title'):
        print(link)
test_function(1)

Run Code Online (Sandbox Code Playgroud)

python beautifulsoup bs4

RDP*_*DPD

2015 04-17

1
推荐指数

1
解决办法

2190
查看次数

如何使用Beautiful Soup拉没有属性的<p>标签？

说一个网页包含以下内容：

<p style="display: none;"><input id="ak_js" name="ak_js" type="hidden" value="68"/></p>

<p><b>Lack of sales.. ANY sales.</b></p>

Run Code Online (Sandbox Code Playgroud)

我正在尝试编写仅拉第二个标签的代码。基本上所有不包含属性的段落标签。我在下面尝试了以下两段代码，但它们没有给我想要的结果。

text = BeautifulSoup(requests.get(url).text)

for tag in text.find_all("p", attrs = False):
    .....

for tag in text.find_all(re.compile("^<p>$")):
    ....

Run Code Online (Sandbox Code Playgroud)

解决此问题的最佳方法是什么？

python beautifulsoup

Mik*_*ler

lucky-day

1
推荐指数

1
解决办法

448
查看次数

<urlopen错误[Errno 1] _ssl.c：510：错误：14077417：SSL

有人知道我为什么收到此错误吗？

SSLError: [Errno 1] _ssl.c:510: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1

Run Code Online (Sandbox Code Playgroud)

使用请求或urllib2时出现错误，我在Kodi上运行代码。当我在PC上的Visual Studio上运行代码时，代码运行良好。

我正在尝试抓取一个被ISP阻止的网站，所以我正在使用该网站的代理版本。

import requests

url = 'https://kickass.unblocked.pe/'
r = requests.get(url)

Run Code Online (Sandbox Code Playgroud)

python ssl beautifulsoup web-scraping kodi

Mic*_*ael

2015 12-30

1
推荐指数

1
解决办法

4532
查看次数

Python Scraping - 无法从Flipkart获取所需数据

我试图从Flipkart网站上抓取客户评论.以下是链接.以下是我的scrape代码,但它总是返回一个空列表.

>>> from bs4 import BeautifulSoup
>>> import requests

>>> r = requests.get('https://www.flipkart.com/samsung-galaxy-j5-6-new-2016-edition-white-16-gb/product-reviews/itmegmrnzqjcpfg9?pid=MOBEG4XWJG7F9A6Z')
>>> soup = BeautifulSoup(r.content, 'lxml') # Tried with 'html.parser' also
>>> soup.find_all('div', '_3DCdKt')
[]
>>> soup.find_all('div', {'class': '_3DCdKt'})
[]
>>> soup.find_all('div', {'class': 'row _3wYu6I _3BRC7L'})
[]
>>> soup.find_all('div', {'class': '_1GRhLX hFPo14'})
[]

Run Code Online (Sandbox Code Playgroud)

所以,我试图获得整个部分,但我只得到以下内容:

>>> soup.find_all('div', {'class': 'col-9-12'})
[<div class="col-9-12" data-reactid="96"><div class="row _2_xtR5" data-reactid="97"></div><div class="row _3wYu6I _1KVtzT" data-reactid="98"></div></div>]

Run Code Online (Sandbox Code Playgroud)

我没有得到其他内容.所以,接下来我尝试了硒,即便如此,它还在返回None.以下是我的硒代码:

>>> driver = webdriver.Firefox()
>>> driver.get('https://www.flipkart.com/samsung-galaxy-j5-6-new-2016-edition-white-16-gb/product-reviews/itmegmrnzqjcpfg9?pid=MOBEG4XWJG7F9A6Z')
>>> a = driver.find_elements_by_class_name("_3DCdKt")
>>> len(a)
10
>>> for …

Run Code Online (Sandbox Code Playgroud)

python selenium beautifulsoup

Jer*_*ril

lucky-day

1
推荐指数

1
解决办法

683
查看次数

使用"下一步"按钮Python进行Web Scraping

我正在收到网页评论,需要扫描每个页面,直到不再有任何评论为止.评论页面有多个页面,我的第一个想法是使用While循环,但是,我不确定从哪里开始.网页的HTML代码看起来与此类似.

最后一页的HTML代码;

任何帮助表示赞赏.

python selenium beautifulsoup python-2.7

Pyt*_*234

2016 12-25

1
推荐指数

1
解决办法

1006
查看次数

BeautifulSoup刮itemprop = “名字” 在Python

我有一些python 3.5代码，我想使用它来刮取网页的一部分，但不是打印“浓稠耐嚼花生酱巧克力片”，而是打印“无”。你知道为什么吗？谢谢。

import requests, bs4
import tkinter as tk
from tkinter import *
import pymysql
import pymysql.cursors

res = requests.get("http://www.foodnetwork.co.uk/article/traybake-recipes/thick-and-chewy-peanut-butter-chocolate-chip-bars/list-page-2.html")
res.raise_for_status()
recipeSoup = bs4.BeautifulSoup(res.text, "html.parser")
type(recipeSoup)
instructions = recipeSoup.find("div", itemprop="name")
try:
    method = str.replace(instructions.get_text(strip=True),". ",".")
    method = str.replace(method, ". ", ".")
    method = (str.replace(method, ".",".\n"))
except AttributeError:
    print(instructions)

Run Code Online (Sandbox Code Playgroud)

链接到页面擦伤

python beautifulsoup web-scraping python-3.x

Har*_*rry

lucky-day

1
推荐指数

1
解决办法

4658
查看次数