相关疑难解决方法(0)

在Python中将XML/HTML实体转换为Unicode字符串

我正在做一些网页抓取,网站经常使用HTML实体来表示非ascii字符.Python是否有一个实用程序,它接受带有HTML实体的字符串并返回unicode类型？

例如:

我回来了:

&#x01ce;

Run Code Online (Sandbox Code Playgroud)

代表带有音标的"ǎ".在二进制中,这表示为16位01ce.我想将html实体转换为值 u'\u01ce'

html python entities

Cri*_*ian

2010 12-16

69
推荐指数

7
解决办法

6万
查看次数

如何在Python 3.1中以字符串形式隐藏HTML实体？

我已经四处寻找并且只找到了python 2.6及更早版本的解决方案,没有关于如何在python 3.X中执行此操作.(我只能访问Win7盒子.)

我必须能够在3.1中执行此操作,并且最好不使用外部库.目前,我安装了httplib2并访问命令提示符curl(这就是我获取页面源代码的方式).不幸的是,curl不解码html实体,据我所知,我找不到在文档中解码它的命令.

是的,我试图让美丽的汤工作,很多时候没有成功3.X. 如果您可以在MS Windows环境中提供有关如何在python 3中使用它的EXPLICIT说明,我将非常感激.

所以,要清楚,我需要将这样Suzy & John的字符串转换成这样的字符串:"Suzy&John".

html python curl entities python-3.x

Vol*_*Rig

2010 03-02

59
推荐指数

3
解决办法

6万
查看次数

如何在Python中unescape撇号等？

我有一个像这样的符号的字符串:

&#39;

Run Code Online (Sandbox Code Playgroud)

那显然是撇号.

我试了saxutils.unescape()没有运气,试过urllib.unquote()

我怎么解码这个？谢谢!

html python django html-entities

ric*_*ick

2009 05-03

8
推荐指数

1
解决办法

7569
查看次数

在Django中取消HTML

我有html编码的文本,如下所示:

RT <a href="http://twitter.com/freuter">@freuter</a>...

Run Code Online (Sandbox Code Playgroud)

我希望这显示为HTML,但我不确定是否有一个过滤器,我可以应用于此文本将html编码的文本转换回html ...

有人可以帮忙吗？

html python django encoding

dem*_*mos

2018 02-20

7
推荐指数

2
解决办法

1万
查看次数

将未转义的 html 插入 django 制作的 rss feed 中

我正在尝试使用 django 使用feedgenerator.Rss201rev2Feed创建播客 rss 提要作为提要生成器，它的工作方式与 BeautifulSoup 有点相反：将信息放入适当的 xml 标签中

它运行良好，但我不想转义所有html

特别是，我希望<itunes:summary>rss feed 的值显示如下： <itunes:summary><![CDATA[Link to <a href="http://www.website.com">the website</a>]]></itunes:summary> 按照Apple 规范

如果我在普通视图中渲染 html，我可以在 html 模板中使用|safe过滤器。我现在需要类似的东西，以有选择地防止<在 rss feed 中转义。

也就是说，我需要 rss 出现<![CDATA[...]]而不是转义<![CDATA[...]]>

然而，Django似乎“无论如何，Django都会自动转义RSS提要（或任何与此相关的XML）中的特殊字符，无论您是否通过安全过滤器传递它”（请参阅this 2009 Question）

到目前为止还没有运气：

因此，迄今为止使用mark_safe 的尝试已被证明是无用的。

我也不确定如何解释一种想法，将“autoescape=False 传递给 django.contrib.synmination.feeds 中的 render() 调用”。

添加到 addQuickElement 注释的建议, escape=False返回了错误

 handler.addQuickElement(u'itunes:summary',item['summary'], escape=False)
 TypeError: addQuickElement() got an unexpected keyword argument 'escape' …

Run Code Online (Sandbox Code Playgroud)

xml django rss escaping feed

Mar*_*ark

2017 05-23

5
推荐指数

1
解决办法

2487
查看次数

"清理"HTML文本的最佳方式

我有以下文字:

"It's the show your only friend and pastor have been talking about! 
<i>Wonder Showzen</i> is a hilarious glimpse into the black 
heart of childhood innocence! Get ready as the complete first season of MTV2's<i> Wonder Showzen</i> tackles valuable life lessons like birth, 
nature, diversity, and history &#8211; all inside the prison of 
your mind! Where else can you..."

Run Code Online (Sandbox Code Playgroud)

我想要做的是删除html标签并将其编码为unicode.我目前在做:

def remove_tags(text):
    return TAG_RE.sub('', text)

Run Code Online (Sandbox Code Playgroud)

这只剥离了标签.我如何正确编码上面的数据库存储？

python

Dav*_*542

lucky-day

5
推荐指数

1
解决办法

1000
查看次数

在python中转义html？

我有一个<img src=__string__>但字符串可能包含",我该怎么做才能逃脱它？

例:

__string__ = test".jpg
<img src="test".jpg">

Run Code Online (Sandbox Code Playgroud)

不起作用.

html python escaping

Tim*_*mmy

2010 06-24

4
推荐指数

3
解决办法

1万
查看次数

Python将文本解码为ascii

如何解码这样的unicode字符串:

什么%2527s%2bthe%2btime%252C%2bnow%253F

像这样的ascii:

什么+的+时间+现在

python unicode decode

tim*_*tim

2011 09-23

-2
推荐指数

1
解决办法

670
查看次数

标签统计

python ×7

html ×5

django ×3

entities ×2

escaping ×2

curl ×1

decode ×1

encoding ×1

feed ×1

html-entities ×1

python-3.x ×1

rss ×1

unicode ×1

xml ×1

标签 统计

标签统计