Div*_*vya 0 python beautifulsoup
我使用BeautifulSoup替换html文件中的所有逗号‚.这是我的代码:
f = open(sys.argv[1],"r")
data = f.read()
soup = BeautifulSoup(data)
comma = re.compile(',')
for t in soup.findAll(text=comma):
t.replaceWith(t.replace(',', '‚'))
Run Code Online (Sandbox Code Playgroud)
此代码有效,除非html文件中包含一些javascript.在这种情况下,它甚至用javascript代码替换逗号(,).这不是必需的.我只想替换html文件的所有文本内容.
soup.findall 可以赎罪:
tags_to_skip = set(["script", "style"])
# Add to this list as needed
def valid_tags(tag):
"""Filter tags on the basis of their tag names
If the tag name is found in ``tags_to_skip`` then
the tag is dropped. Otherwise, it is kept.
"""
if tag.source.name.lower() not in tags_to_skip:
return True
else:
return False
for t in soup.findAll(valid_tags):
t.replaceWith(t.replace(',', '‚'))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4154 次 |
| 最近记录: |