Beautifulsoup =提取标签内的内容

Question

Beautifulsoup =提取标签内的内容

我想提取内容"Hello world".请注意,页面上也有倍数Hello world和类似值<table>.

我尝试了以下方法:

<table border="0" cellspacing="2" width="800">
  <tr>
    <td colspan="2"><b>Name: </b>Hello world</td>
  </tr>
  <tr>
...

Run Code Online (Sandbox Code Playgroud)

但它没有任何回报.

这是代码的片段:

hello = soup.find(text='Name: ')
hello.findPreviousSiblings

Run Code Online (Sandbox Code Playgroud)

另外,我也有以下提取"我的家庭地址"的问题:

<td><b>Address:</b></td>

<td>My home address</td>

Run Code Online (Sandbox Code Playgroud)

我也使用相同的方法来搜索text ="Address:"但是我如何导航到下一行并提取内容<td colspan="2">？

Answer 1

sol*_*les 25

该contents运营商可以很好地用于提取text的<tag>text</tag>.

<td>My home address</td> 例:

s = '<td>My home address</td>'
soup =  BeautifulSoup(s)
td = soup.find('td') #<td>My home address</td>
td.contents #My home address

Run Code Online (Sandbox Code Playgroud)

<td><b>Address:</b></td> 例:

s = '<td><b>Address:</b></td>'
soup =  BeautifulSoup(s)
td = soup.find('td').find('b') #<b>Address:</b>
td.contents #Address:

Run Code Online (Sandbox Code Playgroud)

Answer 2

Dra*_*ric 14

请改用下一个

>>> s = '<table border="0" cellspacing="2" width="800"><tr><td colspan="2"><b>Name: </b>Hello world</td></tr><tr>'
>>> soup = BeautifulSoup(s)
>>> hello = soup.find(text='Name: ')
>>> hello.next
u'Hello world'

Run Code Online (Sandbox Code Playgroud)

下一个和上一个允许您按照解析器处理它们的顺序移动文档元素,而兄弟方法使用解析树

"名称:"出现在文档的其他位置吗？ (2认同)

Answer 3

Bab*_*pha 6

使用下面的代码使用 python beautifulSoup 从 html 标签中提取文本和内容

s = '<td>Example information</td>' # your raw html
soup =  BeautifulSoup(s) #parse html with BeautifulSoup
td = soup.find('td') #tag of interest <td>Example information</td>
td.text #Example information # clean text from html

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，6 月前
查看次数：	51156 次
最近记录：	6 年，3 月前