Mat*_*nti 39 html python beautifulsoup innerhtml
假设我有一个页面div
.我可以很容易地得到那个div soup.find()
.
现在我已经得到了结果,我想要打印出innerhtml
它的全部内容div
:我的意思是,我需要一个包含所有html标签和文本的字符串,就像我在javascript中获得的字符串一样obj.innerHTML
.这可能吗?
Chr*_*isD 49
element.encode_contents()
如果你想要一个UTF-8编码的字节element.decode_contents()
串,可以使用BeautifulSoup 4,如果你想要一个Python Unicode字符串,则使用它.例如,DOM的innerHTML方法可能如下所示:
def innerHTML(element):
"""Returns the inner HTML of an element as a UTF-8 encoded bytestring"""
return element.encode_contents()
Run Code Online (Sandbox Code Playgroud)
这些函数目前不在在线文档中,因此我将引用当前函数定义和代码中的doc字符串.
encode_contents
- 自4.0.4起def encode_contents(
self, indent_level=None, encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a bytestring.
:param indent_level: Each line of the rendering will be
indented this many spaces.
:param encoding: The bytestring will be in this encoding.
:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""
Run Code Online (Sandbox Code Playgroud)
另见格式化程序的文档 ; 您最有可能使用formatter="minimal"
(默认)或formatter="html"
(对于html实体),除非您想以某种方式手动处理文本.
encode_contents
返回编码的字节串.如果您想要Python Unicode字符串,请decode_contents
改用.
decode_contents
- 自4.0.1起decode_contents
做同样的事情,encode_contents
但返回Python Unicode字符串而不是编码的字节串.
def decode_contents(self, indent_level=None,
eventual_encoding=DEFAULT_OUTPUT_ENCODING,
formatter="minimal"):
"""Renders the contents of this tag as a Unicode string.
:param indent_level: Each line of the rendering will be
indented this many spaces.
:param eventual_encoding: The tag is destined to be
encoded into this encoding. This method is _not_
responsible for performing that encoding. This information
is passed in so that it can be substituted in if the
document contains a <META> tag that mentions the document's
encoding.
:param formatter: The output formatter responsible for converting
entities to Unicode characters.
"""
Run Code Online (Sandbox Code Playgroud)
BeautifulSoup 3没有上述功能,相反它有 renderContents
def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
prettyPrint=False, indentLevel=0):
"""Renders the contents of this tag as a string in the given
encoding. If encoding is None, returns a Unicode string.."""
Run Code Online (Sandbox Code Playgroud)
此功能已添加回BeautifulSoup 4(在4.0.4中)以与BS3兼容.
Pik*_*er2 17
给定一个像 一样的 BS4 soup 元素<div id="outer"><div id="inner">foobar</div></div>
,这里有一些不同的方法和属性,可用于以不同的方式检索其 HTML 和文本,以及它们将返回的内容的示例。
内部HTML:
inner_html = element.encode_contents()
'<div id="inner">foobar</div>'
Run Code Online (Sandbox Code Playgroud)
外部HTML:
outer_html = str(element)
'<div id="outer"><div id="inner">foobar</div></div>'
Run Code Online (Sandbox Code Playgroud)
OuterHTML(美化):
pretty_outer_html = element.prettify()
'''<div id="outer">
<div id="inner">
foobar
</div>
</div>'''
Run Code Online (Sandbox Code Playgroud)
仅文本(使用 .text):
element_text = element.text
'foobar'
Run Code Online (Sandbox Code Playgroud)
仅文本(使用 .string):
element_string = element.string
'foobar'
Run Code Online (Sandbox Code Playgroud)
pee*_*why 11
其中一个选项可能是使用类似的东西:
innerhtml = "".join([str(x) for x in div_element.contents])
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
26996 次 |
最近记录: |