使用BeautifulSoup在HTML中搜索和替换

Question

使用BeautifulSoup在HTML中搜索和替换

我想用BeautfulSoup搜索和替换<\a>使用<\a> .我知道如何打开urllib2然后解析以提取所有<a>标签.我想要做的是搜索并用结束标记和中断替换结束标记.任何帮助,非常感谢.

编辑

我认为它会类似于:

soup.findAll('a').

Run Code Online (Sandbox Code Playgroud)

在文档中,有一个:

find(text="ahh").replaceWith('Hooray')

Run Code Online (Sandbox Code Playgroud)

所以我认为这将是:

soup.findAll(tag = '</a>').replaceWith(tag = '</a><br>')

Run Code Online (Sandbox Code Playgroud)

但这不起作用,而python help()并没有给出多少

Answer 1

int*_*jay 17

这将 在每个<a>...</a>元素结束后插入一个标记:

from BeautifulSoup import BeautifulSoup, Tag

# ....

soup = BeautifulSoup(data)
for a in soup.findAll('a'):
    a.parent.insert(a.parent.index(a)+1, Tag(soup, 'br'))

Run Code Online (Sandbox Code Playgroud)

您无法使用,soup.findAll(tag = '</a>')因为BeautifulSoup不会单独对结束标记进行操作 - 它们被视为同一元素的一部分.

如果您想在评论中提问时将<a>元素放在元素中,可以使用:

for a in soup.findAll('a'):
    p = Tag(soup, 'p') #create a P element
    a.replaceWith(p)   #Put it where the A element is
    p.insert(0, a)     #put the A element inside the P (between <p> and </p>)

Run Code Online (Sandbox Code Playgroud)

同样,你不要单独创建它,因为它们是同一个东西的一部分.

Answer 2

Ach*_*hok 5

假设您有一个已知的元素，其中包含“ br”标记标签，那么用另一种字符串删除和替换“ br”标签的方法是这样的：

originalSoup = BeautifulSoup("your_html_file.html")
replaceString = ", " # replace each <br/> tag with ", "
# Ex. <p>Hello<br/>World</p> to <p>Hello, World</p>
cleanSoup = BeautifulSoup(str(originalSoup).replace("<br/>", replaceString))

Run Code Online (Sandbox Code Playgroud)

归档时间：	16 年，1 月前
查看次数：	22327 次
最近记录：	10 年，8 月前