使用 python playwright 获取 href 链接

gus*_*teo 2 python xpath web-scraping playwright

我正在尝试提取 href 内的链接,但我发现它只是元素内的文本

\n

网站代码如下:

\n
<div class="item-info-container ">\n   <a href="/imovel/32600863/" role="heading" aria-level="2" class="item-link xh-highlight" \n   title="Apartamento T3 na avenida da Liberdade, S\xc3\xa3o Jos\xc3\xa9 de S\xc3\xa3o L\xc3\xa1zaro e S\xc3\xa3o Jo\xc3\xa3o do Souto, Braga">\n   Apartamento T3 na avenida da Liberdade, S\xc3\xa3o Jos\xc3\xa9 de S\xc3\xa3o L\xc3\xa1zaro e S\xc3\xa3o Jo\xc3\xa3o do Souto, Braga\n   </a>\n
Run Code Online (Sandbox Code Playgroud)\n

我正在使用的代码是:

\n
element_handle = page.locator('//div[@class="item-info-container "]//a').all_inner_texts()\n
Run Code Online (Sandbox Code Playgroud)\n

无论我是否指定//a[@href],我的输出始终是标题文本:

\n
Apartamento T3 na avenida da Liberdade, S\xc3\xa3o Jos\xc3\xa9 de S\xc3\xa3o L\xc3\xa1zaro e S\xc3\xa3o Jo\xc3\xa3o do Souto, Braga\n
Run Code Online (Sandbox Code Playgroud)\n

当我真正想要实现的是:

\n
/imovel/32600863/\n
Run Code Online (Sandbox Code Playgroud)\n

关于我的逻辑在哪里失败的任何想法?\n提前致谢

\n

can*_*dre 7

使用 get_attribute:

link = page.locator('.item-info-container ').get_by_role('link').get_attribute('href')
Run Code Online (Sandbox Code Playgroud)

**编辑:**多个定位器:

link_locators = page.locator('.item-info-container ').get_by_role('link').all()
for _ in link_locators:
    print(_.get_attribute('href'))
Run Code Online (Sandbox Code Playgroud)