BS4:在标签中获取文本

Question

BS4:在标签中获取文本

Mil*_*ano 12 html python parsing beautifulsoup html-parsing

我正在用美味的汤.有这样的标签:

<li><a href="example"> s.r.o., <small>small</small></a></li>

我想获得不在<a>标签中的文字.所以我想把" sro "作为输出.

我试过<small>但它不起作用.BS4中有命令可以做到吗？

谢谢

Answer 1

ale*_*cxe 16

一种选择是从一开始的第一个元素contents中的a元素:

>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
 s.r.o.,

Run Code Online (Sandbox Code Playgroud)

另一个是找到small标签并获得前一个兄弟:

>>> print soup.find('small').previous_sibling
 s.r.o.,

Run Code Online (Sandbox Code Playgroud)

嗯,还有各种替代/疯狂的选择:

>>> print next(soup.find('a').descendants)
 s.r.o., 
>>> print next(iter(soup.find('a')))
 s.r.o.,

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，7 月前
查看次数：	28718 次
最近记录：	6 年，5 月前