xCh*_*apx 6 python multithreading event-loop flask python-requests-html
我有一个简单的 Flask API,其中一个端点调用另一个文件中的方法,以使用 request-html 从站点呈现一些 javascript
@app.route('/renderJavascript')
def get_attributes():
return get_item_attributes('https://www.site.com.mx/site.html')
Run Code Online (Sandbox Code Playgroud)
该方法的代码如下所示:
from requests_html import HTMLSession
from bs4 import BeautifulSoup
def get_item_attributes(url):
#Connecting to site.
session = HTMLSession()
resp = session.get(url)
resp.html.render()
resp.session.close()
soup = BeautifulSoup(resp.html.html,'lxml')
................................
#Rest of the code is handling the data with bs4 and returning a json.
Run Code Online (Sandbox Code Playgroud)
调用端点后,我收到此错误:
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\flask\app.py", line 2446, in wsgi_app
response = self.full_dispatch_request()
File "C:\Python37\lib\site-packages\flask\app.py", line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Python37\lib\site-packages\flask\app.py", line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "C:\Python37\lib\site-packages\flask\_compat.py", line 39, in reraise
raise value
File "C:\Python37\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Python37\lib\site-packages\flask\app.py", line 1935, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "API.py", line 35, in get_attributes
return get_item_attributes('https://www.shein.com.mx/Floral-Print-Raglan-Sleeve-Curved-Hem-Tee-p-858258-cat-1738.html')
File "C:\Users\xChapx\Desktop\Deving\API\request.py", line 25, in get_item_attributes
resp.html.render()
File "C:\Python37\lib\site-packages\requests_html.py", line 586, in render
self.browser = self.session.browser # Automatically create a event loop and browser
File "C:\Python37\lib\site-packages\requests_html.py", line 727, in browser
self.loop = asyncio.get_event_loop()
File "C:\Python37\lib\asyncio\events.py", line 644, in get_event_loop
% threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'Thread-1'.
Run Code Online (Sandbox Code Playgroud)
我在网上读到,如果 HTMLSession 在主线程之外使用,它就无法正常工作,因为 Flask 在它自己的线程上运行,也许这就是导致错误的原因。
小智 0
该错误是由于 pyppeteer 向被阻塞的 Flask 线程发送退出信号引起的。此解决方法首先阻止它发送该信号。
class AsyncHTMLSessionFixed(AsyncHTMLSession):
def __init__(self, **kwargs):
super(AsyncHTMLSessionFixed, self).__init__(**kwargs)
self.__browser_args = kwargs.get("browser_args", ["--no-sandbox"])
@property
async def browser(self):
if not hasattr(self, "_browser"):
self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, handleSIGINT=False, handleSIGTERM=False, handleSIGHUP=False, args=self.__browser_args)
return self._browser
async def get_item_attributes(url):
#Connecting to site.
session = AsyncHTMLSession()
resp = session.get(url)
await resp.html.arender()
resp.session.close()
soup = BeautifulSoup(resp.html.html,'lxml')
app = Flask(__name__)
if __name__ == "__main__":
asgi_app = WsgiToAsgi(app)
asyncio.run(serve(asgi_app, Config()))
app.run()
Run Code Online (Sandbox Code Playgroud)
我发现一个注释说 app.run(threaded=False) 也可以工作,但我自己无法复制它,并且没有看到放弃线程以损失性能的意义。
| 归档时间: |
|
| 查看次数: |
1392 次 |
| 最近记录: |