小编Eri*_*ins的帖子

beautifulsoup:如何获取表头中元素的索引

我正在尝试提取表头中的元素索引,以便我可以使用结果在表的主体中选择适当的列.列的数量不尽相同,但我需要的列在标题方面保持不变.

所以我想知道,例如,'third'是表格标题中的索引[2],因此<th>第一</ th> <th>第二</ th> <th>第三</ th然后,我可以通过选择<td>的索引号来选择性地选择以下行中的相关<td>.

这是我的尝试:

#TRIAL TO GET INDEXES FROM TABLE HEADERS
from bs4 import BeautifulSoup
html = '<table><thead><tr class="myClass"><th>A</th>'
'<th>B</th><th>C</th><th>D</th></tr></thead></table>'
soup = BeautifulSoup(html)

table = soup.find('table')

for hRow in table.find_all('th'):
hRow = hRow.index('A')
print hRow

Run Code Online (Sandbox Code Playgroud)

得到:

ValueError:Tag.index:元素不在标记中

有什么想法吗？

python beautifulsoup html-parsing

Eri*_*ins

2016 12-20

2
推荐指数

1
解决办法

5622
查看次数

Beautifulsoup 将 colspan=2 替换为单列

我正在尝试解析偶尔具有 colspan=2 的行中的数据，这会破坏我提取目标数据的能力。我想做的是每次出现时从表元素中删除“colspan=2”：

#replace
<td colspan="2" class="time">10:00 AM</td>
#with
<td>635</td>

Run Code Online (Sandbox Code Playgroud)

这可能吗？我可以将其转化为条件 if then else 吗？

这是一个更详细的示例：

<table>
<tr class="playerRow even">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td class="player"><p class="playerName">John doe</p></td>
<td class="background">X</td>
<td>345</td> #THIS ELEMENT FREQUENT
<td></td>
<td></td>
<td></td>
<td></td>
<td style=""></td>
</tr><

<tr class="playerRow odd">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td class="player"><p class="playerName">John doe</p></td>
<td class="background">X</td>
<td colspan="2" class="myClass" style="">3:15 PM</td> #THIS ELEMENT OCCASIONAL
<td></td>
<td></td>
<td></td>
<td></td>
<td style=""></td>
</tr>

<tr class="playerRow odd">
<td class="pos">1</td>
<td><span class="rank"></span> -</td>
<td …

Run Code Online (Sandbox Code Playgroud)

beautifulsoup

Eri*_*ins

2014 07-31

1
推荐指数

1
解决办法

3166
查看次数