Python文件名,不是标记.打开此文件并将文件句柄传递给Beautiful Soup

Question

Python文件名,不是标记.打开此文件并将文件句柄传递给Beautiful Soup

我已经更改了我的Python 2.7例程以接受文件路径作为例程的参数,因此我不必通过在方法内插入多个文件路径来复制代码.

当我的方法被调用时,我收到以下错误:

looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.
  '"%s" looks like a filename, not markup. You should probably open this file and pass the filehandle into Beautiful Soup.' % markup)

Run Code Online (Sandbox Code Playgroud)

我的方法实现是:

def extract_data_from_report3(filename):
    html_report_part1 = open(filename,'r').read()
    soup = BeautifulSoup(filename, "html.parser")
    th = soup.find_all('th')
    td = soup.find_all('td')

    headers = [header.get_text(strip=True) for header in soup.find_all("th")]
    rows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")]))
        for row in soup.find_all("tr")[1:-1]]
    print(rows)
    return rows

Run Code Online (Sandbox Code Playgroud)

调用方法如下:

rows_part1 =  report.extract_data_from_report3(r"E:\test_runners\selenium_regression_test_5_1_1\TestReport\SeleniumTestReport_part1.html")
print "part1 = "
print rows_part1

Run Code Online (Sandbox Code Playgroud)

如何将文件名作为参数传递？

Answer 1

Pad*_*ham 11

如果你想传递文件句柄,那么不要调用read,只需传递open(filename)或文件句柄而不调用read:

def extract_data_from_report3(filename):
    html_report_part1 = open(filename,'r')
    soup = BeautifulSoup( html_report_part1, "html.parser")

Run Code Online (Sandbox Code Playgroud)

要么:

def extract_data_from_report3(filename):
    soup = BeautifulSoup(open(filename), "html.parser")

Run Code Online (Sandbox Code Playgroud)

您可以html_report_part1按照建议调用read后传递但不需要,BeautifulSoup可以获取文件对象.

Answer 2

har*_*r07 5

您应该将已读取的文件的实际内容传递给BeautifulSoup：

html_report_part1 = open(filename,'r').read()
soup = BeautifulSoup(html_report_part1, "html.parser")

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，8 月前
查看次数：	5946 次
最近记录：	9 年，8 月前