相关疑难解决方法(0)

如何摆脱BeautifulSoup用户警告？

安装BeautifulSoup之后,每当我在cmd中运行我的Python时,就会出现这个警告.

D:\Application\python\lib\site-packages\beautifulsoup4-4.4.1-py3.4.egg\bs4\__init__.py:166:
UserWarning: No parser was explicitly specified, so I'm using the best
available HTML parser for this system ("html.parser"). This usually isn't a
problem, but if you run this code on another system, or in a different
virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

Run Code Online (Sandbox Code Playgroud)

我没有理解为什么它出来以及如何解决它.

python beautifulsoup user-warning

jel*_*ang

2016 04-04

41
推荐指数

4
解决办法

5万
查看次数

Robobrowser和本地文件

我是使用Python 3.6.4和RoboBrowser 0.5.3的初学者。我已经保存了一些HTML网页，并且正在尝试提取页面中的信息。

很可能是错误地，我从beautifulSoup的类似问题中获得了灵感。beautifulSoup解决方案适用于我（BeautifulSoup 4.6.0）。

相反，基于roboBrowser的以下内容似乎无效：

 from robobrowser import RoboBrowser
 br = RoboBrowser(parser='html.parser') 
 br.open(open("my_file.html"))

Run Code Online (Sandbox Code Playgroud)

错误：

MissingSchema：无效的URL“ <_io.TextIOWrapper name ='my_file.html'mode ='r'encoding ='UTF-8'>”：未提供任何模式。也许您是说http：// <_io.TextIOWrapper name ='my_file.html'mode ='r'encoding ='UTF-8'>？

我了解该代码应使用基于“ http”的网址。我尝试在文件的绝对路径前添加“ file：//”，但无济于事。

有什么方法可以与库进行通信，因为它是本地文件，或者这种功能不是roboBrowser的一部分？

python-3.x robobrowser

Nof*_*ofP

lucky-day

5
推荐指数

0
解决办法

170
查看次数

读取保存在文本文件中的源页面并提取文本

我有多个文本文件，用于存储来自网站的源页面。所以每个文本文件都是一个源页面。

我需要使用以下代码从存储在文本文件中的 div 类中提取文本：

from bs4 import BeautifulSoup
soup = BeautifulSoup(open("zing.internet.accelerator.plus.txt"))
txt = soup.find('div' , attrs = { 'class' : 'id-app-orig-desc' }).text
print txt

Run Code Online (Sandbox Code Playgroud)

我已经检查了我的汤对象的类型，以确保它在寻找 div 类时没有使用字符串查找方法。汤对象类型

print type(soup)
<class 'bs4.BeautifulSoup'>

Run Code Online (Sandbox Code Playgroud)

我已经参考了之前的一篇文章，并在beautifulsoup声明中写了公开声明。

错误：

Traceback (most recent call last):
  File "html_desc_cleaning.py", line 13, in <module>
    txt2 = soup.find('div' , attrs = { 'class' : 'id-app-orig-desc' }).text
AttributeError: 'NoneType' object has no attribute 'text'

Run Code Online (Sandbox Code Playgroud)

来自页面的来源：