小编har*_*ded的帖子

puppeetersharp 中的单页 PDF

我尝试过将网页转换为单页 pdf，但不支持此操作。有什么解决方法可以实现这个要求吗？

我已经尝试过根据 html 内容大小设置 pdf 页面大小。但对于所有网页来说，它并没有按预期工作。我已经使用 EvaluateExpressionAsync 获取了 html 内容大小。下面是我试图实现我的要求的代码片段，但不适用于所有网页（主要是响应式网页）。

int height = await page.EvaluateExpressionAsync("document.body.clientHeight");

Run Code Online (Sandbox Code Playgroud)

和

dynamic metrics = await Client.SendAsync("Page.getLayoutMetrics").ConfigureAwait(false); 

var width = Convert.ToInt32(Math.Ceiling(Convert.ToDecimal(metrics.contentSize.width.Value))); 
var height = Convert.ToInt32(Math.Ceiling(Convert.ToDecimal(metrics.contentSize.height.Value)));

Run Code Online (Sandbox Code Playgroud)

我已将上述高度和宽度设置为 pdf 页面大小，如屏幕截图实现，但不适用于所有网页。但它在屏幕截图实现中工作正常。你能帮助我实现这个目标吗？

c# webautomation google-chrome-headless puppeteer-sharp

joh*_*ohn

2022 08-05

4
推荐指数

1
解决办法

4348
查看次数

如何在高性能环境下生成网页图像？

我正在尝试在服务器端环境中在一秒钟内生成网页图像。这些请求可以同时来自网络。为此，我使用了运行良好的Puppeteer-Sharp库。在后端，它使用 Chromium 加载页面，然后对其进行截图。

问题是需要一段时间才能开始。例如，请注意 readme.md 示例代码中的时间（来自我的电脑）：

var options = new new LaunchOptions {Headless = true, ExecutablePath = @"c:\foo\chrome.exe"};
var browser = await Puppeteer.LaunchAsync(options).Result;    //  ~500ms
var page = browser.NewPageAsync().Result;                     //  ~215ms
var webPage = page.GoToAsync("http://www.google.com").Result; //  ~500ms
var screenshot = page.ScreenshotAsync(outputFile);            
screenshot.wait();                                            //  ~300ms

Run Code Online (Sandbox Code Playgroud)

如您所见，它很容易超过一秒钟。我不知道 Chromium 内部是如何工作的，所以我有几个关于我正在考虑的解决方案的问题。

是PuppeteerSharp.Browser对象的线程安全和/或重入？我可以使用来自不同线程的相同浏览器对象吗？我不这么认为，因为它与内存中的特定 Chromium 实例相关联。
如果我从每个请求中删除.LaunchAsync和删除.NetPageAsync将显着加快操作速度。PuppeteerSharp.Browser对象池会起作用吗？例如，我可以预先分配其中的 5 个并对其执行.NetPageAsync。然后传入的请求将使用池中的对象。这是一种可行的方法吗？

.net c# webautomation chromium puppeteer-sharp

Ang*_*ker

2020 11-06

4
推荐指数

1
解决办法

1055
查看次数

如何在 Puppeteer 中找到 document.activeElement

我想用 puppeteer 自动填写表格。我填写第一个输入，然后单击一个按钮，然后创建一个具有焦点的新输入字段。

我怎样才能选择这个输入？我可以使用 document.activeElement 吗？如何使用？

  let newActivity = 'button.new_activity'
  await page.waitForSelector(newActivity)
  await page.click(newActivity)

// find active/focused input
await page.type(focusedInput, 'message')

Run Code Online (Sandbox Code Playgroud)

javascript autofill webautomation node.js puppeteer

tri*_*fik

2020 09-29

4
推荐指数

1
解决办法

1508
查看次数

使用 Playwright for Python，如何选择（或查找）一个元素？

我正在尝试学习 Python 版本的 Playwright。看这里

我想学习如何定位一个元素，以便我可以用它做事。就像打印内部 HTML，点击它等等。

下面的示例加载页面并打印 HTML

from playwright import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.newPage()
    page.goto('http://whatsmyuseragent.org/')
    print(page.innerHTML("*"))
    browser.close()

Run Code Online (Sandbox Code Playgroud)

此页面包含一个元素

<div class="user-agent">
    <p class="intro-text">Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4238.0 Safari/537.36</p>
</div>

Run Code Online (Sandbox Code Playgroud)

使用 Selenium，我可以找到元素并像这样打印它的内容

elem = driver.find_element_by_class_name("user-agent")
print(elem)
print(elem.get_attribute("innerHTML"))

Run Code Online (Sandbox Code Playgroud)

我怎样才能在剧作家中做同样的事情？

python webautomation playwright playwright-python

576*_*76i

2020 10-11

4
推荐指数

2
解决办法

2897
查看次数

无法捕获 playwright 中的 response.json()

我正在尝试使用 playwright 捕获 json 响应。我不断收到待处理的 Promise。但是，在 headless:false 模式下，我可以看到数据正在被接收并填充在浏览器上。我刚刚开始玩Playwright，对《Promise》也不是很熟悉。

我尝试过的如下：

(async () => {
        let browser = await firefox.launch({headless: true, userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'});
        let page = await browser.newPage();
        page.waitForResponse(async(response) => {
            if (response.url().includes('/abcd') && response.status() == 200) {
                let resp = await response.json();
                console.log(resp);
            }
        });
        await page.goto('https://myurl.com', {waitUntil: 'networkidle', timeout: 30000});
        await page.waitForTimeout(20000);
        await browser.close();
})

Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么？我尝试过增加超时时间。没有帮助。

javascript webautomation playwright

fir*_*sto

2021 04-09

4
推荐指数

1
解决办法

9533
查看次数

剧作家“元素未附加到 DOM”

我正在尝试使用 Playwright (.NET) 抓取一个网站。该网站看起来像是在 2000 年代初编写的（以怪异模式等运行），而我遇到了一个我似乎无法找到解决方案的问题。

我的目标是选中一个复选框。我可以input使用选择元素

var input = await page.QuerySelectorAsync("inputSelector")

Run Code Online (Sandbox Code Playgroud)

该元素已成功选择，但在尝试运行时await input.CheckAsync()，出现错误Element is not attached to the DOM。我没有注意到会导致这种情况的元素有任何异常。为什么会发生此错误？

更新

await page.ClickAsync("inputSelector")我通过运行来检查该框来使其工作。这适用于我的目的，但它不能解释为什么如果以其他方式完成它会出错，所以我仍然想知道为什么会发生该错误。

c# webautomation playwright playwright-sharp playwright-dotnet

ste*_*r42

2022 04-19

4
推荐指数

1
解决办法

7775
查看次数

剧作家强制点击隐藏元素不起作用

我正在使用 Playwright 进行端到端测试。其中一种场景涉及检查 PDFviewer 窗口中显示的 pdf 内容，该窗口的下载按钮已对最终用户隐藏。检查 pdf 内容需要下载它，因此我使用\nforce文档提到的选项：\n https://playwright.dev/docs/api/class-page#page-click

使用的实现如下：

innerFrameContent.click("//button[contains(@id, \'secondaryDownload\')]", { force: true })\n

Run Code Online (Sandbox Code Playgroud)\n

（xpath是正确的，我在Chrome浏览器中检查并设法通过控制台单击该元素）

不幸的是，我从 Playwright 收到以下异常日志：

frame.click: Element is not visible\n=========================== logs ===========================\nwaiting for selector "//button[contains(@id, \'secondaryDownload\')]"\n  selector resolved to hidden <button tabindex="54" title="Download" id="secondaryDown\xe2\x80\xa6>\xe2\x80\xa6</button>\nattempting click action\n  waiting for element to be visible, enabled and stable\n    forcing action\n  element is visible, enabled and stable\n  scrolling into view if needed\n============================================================\n...\n

Run Code Online (Sandbox Code Playgroud)\n

javascript node.js playwright

Aug*_*Bar

2022 01-31

4
推荐指数

1
解决办法

2万
查看次数

puppeteer 等待页面/DOM 更新 - 响应初始加载后添加的新项目

我想使用 Puppeteer 来响应页面更新。该页面显示项目，当我离开页面打开时，新项目可能会随着时间的推移出现。例如，每 10 秒添加一个新项目。

我可以使用以下内容来等待页面初始加载时的项目：

await page.waitFor(".item");
console.log("the initial items have been loaded")

Run Code Online (Sandbox Code Playgroud)

我怎样才能等待/捕捉未来的物品？我想实现这样的东西（伪代码）：

await page.goto('http://mysite');
await page.waitFor(".item");
// check items (=these initial items)

// event when receiving new items:
// check item(s) (= the additional [or all] items)

Run Code Online (Sandbox Code Playgroud)

javascript webautomation node.js puppeteer

wiv*_*vku

2021 03-22

3
推荐指数

1
解决办法

4239
查看次数

puppeteer querySelector 不是有效的选择器

我有一个代码如下：

page.click('div.button-table div:contains(Who) div.square-button:nth-child(1)')

Run Code Online (Sandbox Code Playgroud)

当 puppeteer 运行此代码时，它会引发错误：

简短的

Failed to execute 'querySelector' on 'Document': 'div.button-table div:contains(Who) div.square-button:nth-child(1)' is not a valid selector.

满的

 Error: Evaluation failed: DOMException: Failed to execute 'querySelector' on 'Document': 'div.button-table div:contains(Who) div.square-button:nth-child(1)' is not a valid selector.
at __puppeteer_evaluation_script__:1:33
  at ExecutionContext.evaluateHandle (node_modules/puppeteer/lib/ExecutionContext.js:124:13)
  at <anonymous>
-- ASYNC --
  at ExecutionContext.<anonymous> (node_modules/puppeteer/lib/helper.js:144:27)
  at ElementHandle.$ (node_modules/puppeteer/lib/ExecutionContext.js:529:50)
  at ElementHandle.<anonymous> (node_modules/puppeteer/lib/helper.js:145:23)
  at Frame.$ (node_modules/puppeteer/lib/FrameManager.js:456:34)
  at <anonymous>
-- ASYNC --
  at Frame.<anonymous> (node_modules/puppeteer/lib/helper.js:144:27)
  at Frame.click (node_modules/puppeteer/lib/FrameManager.js:735:31)
  at Frame.<anonymous> (node_modules/puppeteer/lib/helper.js:145:23)
  at Page.click (node_modules/puppeteer/lib/Page.js:973:29) …

Run Code Online (Sandbox Code Playgroud)

javascript webautomation node.js puppeteer

mCY*_*mCY

2020 10-11

3
推荐指数

1
解决办法

6113
查看次数

如何从python剧作家定位器对象获取外部html？

我找不到任何从 python playwright 返回外部 html 的方法 page.locator(selector, **kwargs)。我错过了什么吗？ locator.inner_html(**kwargs)确实存在。但是，我尝试使用 pandas.read_html ，但它在表定位器内部 html 上失败，因为它触发了表标记。

我目前正在做的是使用 bs4 来解析 page.content()。就像是：

soup = BeautifulSoup(page.content(), 'lxml')
df = pd.read_html(str(soup.select('table.selector')))

Run Code Online (Sandbox Code Playgroud)

python playwright playwright-python

Rah*_*hul

2022 01-28

3
推荐指数

1
解决办法

2736
查看次数

标签统计

webautomation ×8

javascript ×5

playwright ×5

node.js ×4

c# ×3

puppeteer ×3

playwright-python ×2

puppeteer-sharp ×2

python ×2

.net ×1

autofill ×1

chromium ×1

google-chrome-headless ×1

playwright-dotnet ×1

playwright-sharp ×1

更新

标签 统计

小编har_ded的帖子

标签统计