如何在python beautifulsoup中加入工作

Question

如何在python beautifulsoup中加入工作

我正在学习python和beautifulsoup,并在网上看到了这段代码:

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re

html = ['<html><body><p align="center"><b><font size="2">Table 1</font></b><table><tr><td>1. row 1, cell 1</td><td>1. row 1, cell 2</td></tr><tr><td>1. row 2, cell 1</td><td>1. row 2, cell 2</td></tr></table><p align="center"><b><font size="2">Table 2</font></b><table><tr><td>2. row 1, cell 1</td><td>2. row 1, cell 2</td></tr><tr><td>2. row 2, cell 1</td><td>2. row 2, cell 2</td></tr></table></html>']
soup = BeautifulSoup(''.join(html))
searchtext = re.compile(r'Table\s+1',re.IGNORECASE)
foundtext = soup.find('p',text=searchtext) # Find the first <p> tag with the search text
table = foundtext.findNext('table') # Find the first <table> tag that follows it
rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        try:
            text = ''.join(td.find(text=True))
        except Exception:
            text = ""
        print text+"|",
    print

Run Code Online (Sandbox Code Playgroud)

而其他一切都很清楚,我无法理解联接是如何工作的.

    text = ''.join(td.find(text=True))

Run Code Online (Sandbox Code Playgroud)

我尝试搜索BS文档以进行加入,但我找不到任何内容,也无法在线找到有关如何在BS中使用联接的帮助.

请让我知道这条线是如何工作的.谢谢!

PS:上面的代码来自另一个stackoverflow页面,它不是我的作业:) 如何在Python中使用BeautifulSoup在文本字符串后找到一个表？

Answer 1

Mar*_*ers 5

''.join()是一个python函数,而不是任何BS特定的.它让你加入一个带有字符串的序列作为连接值:

>>> '-'.join(map(str, range(3)))
'0-1-2'
>>> ' and '.join(('bangers', 'mash'))
'bangers and mash'

Run Code Online (Sandbox Code Playgroud)

'' 只是空字符串,并使整个字符串组合成一个大字符串更容易:

>>> ''.join(('5', '4', 'apple', 'pie'))
'54applepie'

Run Code Online (Sandbox Code Playgroud)

在您的示例的特定情况下,该语句查找<td>元素中包含的所有文本,包括任何包含的HTML元素(如<b>or <i>或)<a href="">,并将它们全部放在一个长字符串中.因此td.find(text=True)找到一系列python字符串,''.join()然后将它们连接成一个长字符串.

归档时间：	13 年，8 月前
查看次数：	3221 次
最近记录：	13 年，8 月前