相关疑难解决方法(0)

如何在Python中使用BeautifulSoup在文本字符串后面找到一个表?

我试图从几个网页中提取数据,这些网页在显示表格方面不一致.我需要编写将搜索文本字符串的代码,然后立即转到该特定文本字符串后面的表.然后我想提取该表的内容.这是我到目前为止所得到的:

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re

html = ['<html><body><p align="center"><b><font size="2">Table 1</font></b><table><tr><td>1. row 1, cell 1</td><td>1. row 1, cell 2</td></tr><tr><td>1. row 2, cell 1</td><td>1. row 2, cell 2</td></tr></table><p align="center"><b><font size="2">Table 2</font></b><table><tr><td>2. row 1, cell 1</td><td>2. row 1, cell 2</td></tr><tr><td>2. row 2, cell 1</td><td>2. row 2, cell 2</td></tr></table></html>']
soup = BeautifulSoup(''.join(html))
searchtext = re.compile('Table 1',re.IGNORECASE) # Also need to figure out how to ignore space
foundtext = soup.findAll('p',text=searchtext)
soupafter = foundtext.findAllNext()
table = soupafter.find('table') # find the next table …
Run Code Online (Sandbox Code Playgroud)

python beautifulsoup

4
推荐指数
1
解决办法
6765
查看次数

标签 统计

beautifulsoup ×1

python ×1