用beautifulsoup解析<br>标签

Question

用beautifulsoup解析<br>标签

ksh*_*ava 1 html tags beautifulsoup web-crawler web-scraping

我正在爬网一个网站，
标记的结构是：

<div class="content"
    <p> 
        "C Space"
        <br>
        "802 white avenue"
        <br>
        "xyz 123"
        <br>
        "Lima"
    </p>

Run Code Online (Sandbox Code Playgroud)

当我使用beautifulsoup使用以下命令获取文本时：

html=urlopen("something")
bsObj = BeautifulSoup(html,"html5lib")
templist = bsObj.find("div",{"class":"content"})
print(templist.get_text())

Run Code Online (Sandbox Code Playgroud)

我得到以下输出：C Space802 white avenuexyz 123Lima

而我希望输出为：C Space 802 white avenue xyz 123 Lima。

从后续br标签获取数据时，如何添加额外的空格？

谢谢

Answer 1

ale*_*cxe 5

您可以使用以下.get_text()参数：

In [4]: elm = soup.select_one(".content")

In [5]: print(elm.get_text(strip=True, separator=" "))
"C Space" "802 white avenue" "xyz 123" "Lima"

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，6 月前
查看次数：	2453 次
最近记录：	8 年，6 月前

用beautifulsoup解析&lt;br&gt;标签

用beautifulsoup解析<br>标签