Kri*_*ish 28 python markdown parsing
我需要将markdown文本转换为纯文本格式以在我的网站中显示摘要.我想要python中的代码.
Jas*_*oon 41
该模块将帮助您完成您所描述的内容:
http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module
将markdown转换为HTML后,可以使用HTML解析器去除纯文本.
您的代码可能如下所示:
from BeautifulSoup import BeautifulSoup
from markdown import markdown
html = markdown(some_html_string)
text = ''.join(BeautifulSoup(html).findAll(text=True))
Run Code Online (Sandbox Code Playgroud)
Despite the fact that this is a very old question, I'd like to suggest a solution I came up with recently. This one neither uses BeautifulSoup nor has an overhead of converting to html and back.
该降价模块核心类降价有一个属性output_formats这是不是配置的,但以其他方式可修补像蟒蛇几乎所有的东西是。此属性是将输出格式名称映射到渲染函数的字典。默认情况下,它有两种输出格式,分别是'html'和'xhtml'。在一点帮助下,它可能具有易于编写的纯文本呈现功能:
from markdown import Markdown
from io import StringIO
def unmark_element(element, stream=None):
if stream is None:
stream = StringIO()
if element.text:
stream.write(element.text)
for sub in element:
unmark_element(sub, stream)
if element.tail:
stream.write(element.tail)
return stream.getvalue()
# patching Markdown
Markdown.output_formats["plain"] = unmark_element
__md = Markdown(output_format="plain")
__md.stripTopLevelTags = False
def unmark(text):
return __md.convert(text)
Run Code Online (Sandbox Code Playgroud)
unmark函数将markdown文本作为输入,并返回所有剥离掉的markdown字符。
这与 Jason 的答案类似,但正确处理评论。
import markdown # pip install markdown
from bs4 import BeautifulSoup # pip install beautifulsoup4
def md_to_text(md):
html = markdown.markdown(md)
soup = BeautifulSoup(html, features='html.parser')
return soup.get_text()
def example():
md = '**A** [B](http://example.com) <!-- C -->'
text = md_to_text(md)
print(text)
# Output: A B
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
15202 次 |
| 最近记录: |