在BeautifulSoup CSS选择器中处理冒号

ale*_*cxe 6 html python beautifulsoup css-selectors html-parsing

输入HTML:

<div style="display: flex">
    <div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
    <div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
    <div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
</div>
Run Code Online (Sandbox Code Playgroud)

所需的输出:所有div元素正好在下面<div style="display: flex">.

我正在尝试div使用CSS选择器找到父级:

div[style="display: flex"]
Run Code Online (Sandbox Code Playgroud)

这会引发错误:

>>> soup.select('div[style="display: flex"]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
    'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
Run Code Online (Sandbox Code Playgroud)

看起来像BeautifulSoup尝试将冒号解释为伪类语法.

我试图遵循在CSS选择器中的元素ID中处理冒号时建议的建议,但它仍然会抛出错误:

>>> soup.select('div[style="display\: flex"]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
    'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
>>> soup.select('div[style="display\3A flex"]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1426, in select
    'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "div[style="displayA"
Run Code Online (Sandbox Code Playgroud)

问题:

BeautifulSoupCSS选择器中使用/转义冒号的正确方法是什么?


请注意,我可以使用部分属性匹配来解决它:

soup.select("div[style$=flex]")
Run Code Online (Sandbox Code Playgroud)

或者,用find_all():

soup.find_all("div", style="display: flex")
Run Code Online (Sandbox Code Playgroud)

还要注意,我理解使用style定位元素远不是一个好的定位技术,但问题本身是通用的,提供的HTML只是一个例子.

ale*_*cxe 2

更新:该问题现已在 BeautifulSoup 4.5.0 中修复,如果需要请升级:

pip install --upgrade beautifulsoup4
Run Code Online (Sandbox Code Playgroud)

旧答案:

在问题跟踪器上创建了一个问题BeautifulSoup

如果启动板问题有任何更新,将更新答案。