在 Flask 应用程序中使用 requests_html

Jer*_*hez 5 multithreading screen-scraping flask python-3.x

我正在尝试从 Flask 应用程序的html.render()Python 模块运行该方法。requests_html但是，每当我的应用程序代码调用该函数时，我都会收到此错误：RuntimeError: There is no current event loop in thread 'Thread-1'.

这是使用该模块的函数html.render：

def extractor(url):
    session = HTMLSession()
    r = session.get(url)
    soup = bs4.BeautifulSoup(r.text)
    found = soup.find_all("a", href=privacy_regex)
    if found:
        print("Using Default Web Scraping bs4+regex")
        found = [tag['href'] for tag in found]
        uri = sorted(found, key=rank_url)[-1]
        return urljoin(url, uri)
    else:
        print('Using HTML Rendering')
        r.html.render()
        links = r.html.absolute_links
        privacy_links = [x for x in links if privacy_regex.search(x)]
        uri = sorted(privacy_links, key=rank_url)[-1]
        return urljoin(url, uri)

Run Code Online (Sandbox Code Playgroud)

这是我的应用程序代码

@app.route('/api', methods=['POST', 'GET'])
def text_output():
        url = request.form['url_text']
        print(url)
        text, domain = url_input_parser(url)
        print(text, domain)

Run Code Online (Sandbox Code Playgroud)

任何帮助表示赞赏！非常感谢！

归档时间：	7 年，5 月前
查看次数：	300 次
最近记录：	7 年，5 月前