使用BeautifulSoup,如何仅从特定选择器获取文本而不使用子文本中的文本？

Question

使用BeautifulSoup,如何仅从特定选择器获取文本而不使用子文本中的文本？

Bit*_*Bit 2 python beautifulsoup html-parsing web-scraping

我不知道如何编写BeautifulSoup代码,因此它只给我选定标签中的文本.我得到更多,如其孩子的文字(仁)!

例如:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<div id="left"><ul><li>"I want this text"<a href="someurl.com"> I don\'t want this text</a><p>I don\'t want this either</li><li>"Good"<a href="someurl.com"> Not Good</a><p> Not Good either</li></ul></div>', "html5lib") 
x = soup.select('ul > li')
for i in x:
    print(i.text)

Run Code Online (Sandbox Code Playgroud)

输出:

"我想要这个文字"我不想要这个文字我也不想要这个

"好"不好也不好

期望的输出:

"我想要这个文字"

"好"

Answer 1

ale*_*cxe 5

一种选择是获取contents列表的第一个元素:

for i in x:
    print(i.contents[0])

Run Code Online (Sandbox Code Playgroud)

另一个 - 找到第一个文本节点:

for i in x:
    print(i.find(text=True))

Run Code Online (Sandbox Code Playgroud)

两者都打印:

"I want this text"
"Good"

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，4 月前
查看次数：	1053 次
最近记录：	9 年，1 月前