BeautifulSoup replaceWith()方法添加转义的html,希望它不转义

43T*_*cts 7 python django beautifulsoup

我有一个python方法(感谢这个片段),它<a>使用BeautifulSoup和Django的urlize 在一些未格式化的链接上获取一些html和包装标签:

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(urlizedText)

    print(soup)

    return str(soup)
Run Code Online (Sandbox Code Playgroud)

样本输入文本(由第一个print语句输出)是这样的:

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: http://google.ca
Run Code Online (Sandbox Code Playgroud)

生成的返回文本(由第二个print语句输出)是这样的:

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: &lt;a href="http://google.ca"&gt;http://google.ca&lt;/a&gt;
Run Code Online (Sandbox Code Playgroud)

正如你所看到的,它是格式化链接,但是它是使用转义的html进行的,所以当我在模板中打印它时{{ my.html|safe }}它不会呈现为html.

那么我怎样才能将这些添加了urlize的标签转义为非转义,并正确呈现为html?我怀疑这与我使用它作为方法而不是模板过滤器有关吗?我实际上找不到这个方法的文档,它没有出现在django.utils.html中.

编辑:似乎逃避实际发生在这一行:textNode.replaceWith(urlizedText).

Oli*_*Oli 6

你可以将你的urlizedText字符串转换为一个新的BeautifulSoup对象,它将被视为一个标签,而不是一个文本(它可以像你期望的那样被转义)

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(BeautifulSoup(urlizedText, "html.parser"))

    print(soup)

    return str(soup)
Run Code Online (Sandbox Code Playgroud)