mad*_*ops 1 html python regex html-parsing
给出一个字符串
"<p> >this line starts with an arrow <br /> this line does not </p>"
Run Code Online (Sandbox Code Playgroud)
要么
"<p> >this line starts with an arrow </p> <p> this line does not </p>"
Run Code Online (Sandbox Code Playgroud)
如何找到以箭头开头的行并用div包围它们
这样就变成了:
"<p> <div> >this line starts with an arrow </div> <br /> this line does not </p>
Run Code Online (Sandbox Code Playgroud)
由于它是您正在解析的HTML,因此请使用该工具进行工作 - 一个HTML解析器,例如BeautifulSoup.
使用find_all()查找以启动所有文本节点>和wrap()它们与新的div标签:
from bs4 import BeautifulSoup
data = "<p> >this line starts with an arrow <br /> this line does not </p>"
soup = BeautifulSoup(data)
for item in soup.find_all(text=lambda x: x.strip().startswith('>')):
item.wrap(soup.new_tag('div'))
print soup.prettify()
Run Code Online (Sandbox Code Playgroud)
打印:
<p>
<div>
>this line starts with an arrow
</div>
<br/>
this line does not
</p>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
732 次 |
| 最近记录: |