如何使用BeautifulSoup搜索标签列表，列表中的一项具有属性？

Question

如何使用BeautifulSoup搜索标签列表，列表中的一项具有属性？

Sam*_*dle 5 html python beautifulsoup web-scraping

有谁知道如何在python中使用bs4搜索多个标签，其中之一需要一个属性？

例如，要搜索具有属性的一个标签的所有出现，我知道我可以这样做：

tr_list = soup_object.find_all('tr', id=True)

而且我知道我也可以这样做：

tag_list = soup_object.find_all(['a', 'b', 'p', 'li'])

但是我无法弄清楚如何组合这两个语句，从理论上讲，这将按出现所有这些html标记的顺序为我提供一个列表，每个“ tr”标记都有一个ID。

html片段如下所示：

  <tr id="uniqueID">
   <td nowrap="" valign="baseline" width="8%">
    <b>
     A_time_as_text
    </b>
   </td>
   <td class="storyTitle">
    <a href="a_link.com" target="_new">
     some_text
    </a>
    <b>
     a_headline_as_text
    </b>
    a_number_as_text
   </td>
  </tr>
  <tr>
   <td>
    <br/>
   </td>
   <td class="st-Art">
    <ul>
     <li>
      more_text_text_text
      <strong>
       more_text_text_text
       <font color="228822">
        more_text_text_text
       </font>
      </strong>
      more_text_text_text
     </li>
     <li>
      more_text_text_text
      <ul>
       <li>
        more_text_text_text
       </li>
      </ul>
     </li>
    </ul>
   </td>
  </tr>
  <tr>
  </tr>

Run Code Online (Sandbox Code Playgroud)

预先感谢所有帮助！

Answer 1

Mar*_*ans 2

我建议您添加tr到所需的标签列表，然后检查id循环中是否存在该属性：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

for tag in soup.find_all(['a', 'b', 'p', 'li', 'tr']):
    if tag.name != 'tr' or (tag.name == 'tr' and tag.get('id')):
        print tag.name

Run Code Online (Sandbox Code Playgroud)

对于您的 html，这将显示：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

for tag in soup.find_all(['a', 'b', 'p', 'li', 'tr']):
    if tag.name != 'tr' or (tag.name == 'tr' and tag.get('id')):
        print tag.name

Run Code Online (Sandbox Code Playgroud)

请注意，如果您实际上正在尝试获取包含在礼物中的和标签，a b p那么以下方法会更合适：litrid

for tr in soup.find_all('tr', id=True):
    for tag in tr.find_all(['a', 'b', 'p', 'li']):
        print tag.name, tag.get_text(strip=True)

Run Code Online (Sandbox Code Playgroud)

这会给你：

tr
b
a
b
li
li
li

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，11 月前
查看次数：	293 次
最近记录：	7 年，11 月前