tes*_*ohn 4 scrapy playwright playwright-python
我正在将 scrapy 与 playwright 集成,但发现自己在单击后添加计时器时遇到困难。因此,当我点击后截取页面的屏幕截图时,它仍然挂在登录页面上。
如何集成计时器以便页面等待几秒钟直到页面加载?
选择器
.onetrust-close-btn-handler.onetrust-close-btn-ui.banner-close-button.onetrust-lg.ot-close-icon下面替换为.onetrust-close-btn-handlerimport scrapy
from scrapy_playwright.page import PageCoroutine
class DoorSpider(scrapy.Spider):
name = 'door'
start_urls = ['https://nextdoor.co.uk/login/']
def start_requests(self):
for url in self.start_urls:
yield scrapy.Request(
url=url,
callback = self.parse,
meta= dict(
playwright = True,
playwright_include_page = True,
playwright_page_coroutines = [
PageCoroutine("click",
selector = ".onetrust-close-btn-handler"),
PageCoroutine("fill", "#id_email", 'my_email'),
PageCoroutine("fill", "#id_password",
'my_password'),
PageCoroutine('waitForNavigation'),
PageCoroutine("click", selector="#signin_button"),
PageCoroutine("screenshot", path="cookies.png",
full_page=True),
]
)
)
def parse(self, response):
yield {
'data':response.body
}
Run Code Online (Sandbox Code Playgroud)
小智 6
waiting您可以根据您的特定用例使用多种方法。以下是示例,但您可以从文档中阅读更多内容
wait_for_event(event, **kwargs)wait_for_selector(selector, **kwargs)wait_for_load_state(**kwargs)wait_for_url(url, **kwargs)wait_for_timeout(timeout对于您的问题,如果您需要等到页面加载,您可以使用下面的协程并将其插入列表中的适当位置:
...
PageCoroutine("wait_for_load_state", "load"),
...
Run Code Online (Sandbox Code Playgroud)
或者
...
PageCoroutine("wait_for_load_state", "domcontentloaded"),
...
Run Code Online (Sandbox Code Playgroud)
wait如果上述两种方法不起作用,您可以尝试任何其他方法,或者您可以使用显式超时值,例如 3 秒。(不推荐这样做,因为它会更频繁地失败,并且在网页抓取时不是最佳选择)
...
PageCoroutine("wait_for_timeout", 3000),
...
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
20338 次 |
| 最近记录: |