Ria*_*ani 7 beautifulsoup python-2.7
我已经更改了我的Python 2.7例程以接受文件路径作为例程的参数,因此我不必通过在方法内插入多个文件路径来复制代码.
当我的方法被调用时,我收到以下错误:
looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.
'"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)
Run Code Online (Sandbox Code Playgroud)
我的方法实现是:
def extract_data_from_report3(filename):
html_report_part1 = open(filename,'r').read()
soup = BeautifulSoup(filename, "html.parser")
th = soup.find_all('th')
td = soup.find_all('td')
headers = [header.get_text(strip=True) for header in soup.find_all("th")]
rows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")]))
for row in soup.find_all("tr")[1:-1]]
print(rows)
return rows
Run Code Online (Sandbox Code Playgroud)
调用方法如下:
rows_part1 = report.extract_data_from_report3(r"E:\test_runners\selenium_regression_test_5_1_1\TestReport\SeleniumTestReport_part1.html")
print "part1 = "
print rows_part1
Run Code Online (Sandbox Code Playgroud)
如何将文件名作为参数传递?
Pad*_*ham 11
如果你想传递文件句柄,那么不要调用read,只需传递open(filename)或文件句柄而不调用read:
def extract_data_from_report3(filename):
html_report_part1 = open(filename,'r')
soup = BeautifulSoup( html_report_part1, "html.parser")
Run Code Online (Sandbox Code Playgroud)
要么:
def extract_data_from_report3(filename):
soup = BeautifulSoup(open(filename), "html.parser")
Run Code Online (Sandbox Code Playgroud)
您可以html_report_part1按照建议调用read后传递但不需要,BeautifulSoup可以获取文件对象.
您应该将已读取的文件的实际内容传递给BeautifulSoup:
html_report_part1 = open(filename,'r').read()
soup = BeautifulSoup(html_report_part1, "html.parser")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5946 次 |
| 最近记录: |