如何在特定的<ul>类中找到所有<li>?

use*_*287 3 python beautifulsoup python-2.7

环境:

美丽的汤4

Python 2.7.5

逻辑:

'find_all' <li>实例位于<ul>a类中,my_class例如:

<ul class='my_class'>
<li>thing one</li>
<li>thing two</li>
</ul>
Run Code Online (Sandbox Code Playgroud)

澄清:只需获取<li>标签之间的"文本" 即可.

Python代码:

(下面的find_all不正确,我只是把它放在上下文中)

from bs4 import BeautifulSoup, Comment
import re

# open original file
fo = open('file.php', 'r')
# convert to string
fo_string = fo.read()
# close original file
fo.close()
# create beautiful soup object from fo_string
bs_fo_string = BeautifulSoup(fo_string, "lxml")
# get rid of html comments
my_comments = bs_fo_string.findAll(text=lambda text:isinstance(text, Comment))
[my_comment.extract() for my_comment in my_comments]

my_li_list = bs_fo_string.find_all('ul', 'my_class')

print my_li_list
Run Code Online (Sandbox Code Playgroud)

Ter*_*ryA 10

这个?

>>> html = """<ul class='my_class'>
... <li>thing one</li>
... <li>thing two</li>
... </ul>"""
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(html)
>>> for ultag in soup.find_all('ul', {'class': 'my_class'}):
...     for litag in ultag.find_all('li'):
...             print litag.text
... 
thing one
thing two
Run Code Online (Sandbox Code Playgroud)

说明:

soup.find_all('ul', {'class': 'my_class'})查找ul具有类的所有标记my_class.

然后,我们找到这些li标签中的所有ul标签,并打印标签的内容.