如何使用BeautifulSoup bs4获取HTML标记的内部文本值？

Question

如何使用BeautifulSoup bs4获取HTML标记的内部文本值？

使用BeautifulSoup bs4时,如何从HTML标签内部获取文本？当我运行这一行时:

oname = soup.find("title")

Run Code Online (Sandbox Code Playgroud)

我得到这样的title标签:

<title>page name</title>

Run Code Online (Sandbox Code Playgroud)

现在我想只得到它的内部文本page name,没有标签.怎么做？

Answer 1

Pad*_*ham 9

使用.text从标记中获取文本.

oname = soup.find("title")
oname.text

Run Code Online (Sandbox Code Playgroud)

要不就 soup.title.text

In [4]: from bs4 import BeautifulSoup    
In [5]: import  requests
In [6]: r = requests.get("http://stackoverflow.com/questions/27934387/how-to-retrieve-information-inside-a-tag-with-python/27934403#27934387")    
In [7]: BeautifulSoup(r.content).title.text
Out[7]: u'html - How to Retrieve information inside a tag with python - Stack Overflow'

Run Code Online (Sandbox Code Playgroud)

要打开文件并使用文本作为名称,请像使用任何其他字符串一样使用它:

with open(oname.text, 'w') as f

Run Code Online (Sandbox Code Playgroud)

奇怪的是，[文档](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)中没有提到“text”属性。我发现的最接近的是 [get_text()](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text) 方法。 (2认同)

归档时间：	11 年前
查看次数：	6512 次
最近记录：	9 年，10 月前