获取元素的文本,但用空格分隔

Jim*_*sla 3 beautifulsoup

我从网页中获取文本数据,当我使用.text它时,它会合并所有元素。但是,我想用空格分隔其中一些。

例如,我有这样的文字:

data=['<span class="sub-title title-block"><span class="nowrap">1.2</span><span class="nowrap">TEKNA</span></span>',
'<span class="sub-title title-block"><span class="nowrap">Amr</span><span class="nowrap">V12 5.2</span></span>',
'<span class="sub-title title-block"></span>']
Run Code Online (Sandbox Code Playgroud)

当我执行以下操作时:

from bs4 import BeautifulSoup
for i in data:
    soup = BeautifulSoup(i, 'lxml')
    for d in soup:
        print(d.text)
Run Code Online (Sandbox Code Playgroud)

我得到:

1.2TEKNA
AmrV12 5.2
Run Code Online (Sandbox Code Playgroud)

但我想要预期的输出:

1.2 TEKNA
Amr V12 5.2
Run Code Online (Sandbox Code Playgroud)

在那里我把text彼此分开。

小智 7

您可以使用get_text(<sep>)方法并定义自定义分隔符,如下所示:

from bs4 import BeautifulSoup

data=['<span class="sub-title title-block"><span class="nowrap">1.2</span><span class="nowrap">TEKNA</span></span>',
'<span class="sub-title title-block"><span class="nowrap">Amr</span><span class="nowrap">V12 5.2</span></span>',
'<span class="sub-title title-block"></span>']

for i in data:
    soup = BeautifulSoup(i, 'lxml')
    for d in soup:
        print(d.get_text(" "))
Run Code Online (Sandbox Code Playgroud)

输出:

1.2 TEKNA
Amr V12 5.2
Run Code Online (Sandbox Code Playgroud)