基于Javascript的动态内容使用htmlUnit

Irs*_*had 4 javascript java htmlunit

我一直坚持使用HtmlUnit获取基于JavaScript的动态内容.我期待从页面获得(Signin,注册html内容).使用以下代码,我只获取静态内容.

我是HtmlUnit的新手.任何帮助将受到高度赞赏.

String strURL = "https://www.checkmytrip.com" ;
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF);
java.util.logging.Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.OFF);

final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_31);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
webClient.waitForBackgroundJavaScript(60 * 1000);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());

HtmlPage myPage = ((HtmlPage) webClient.getPage(strURL));

String theContent = myPage.getWebResponse().getContentAsString();
System.out.println(theContent);      
Run Code Online (Sandbox Code Playgroud)

Ahm*_*our 5

两点:

  1. 在获得页面后,您需要waitForBackgroundJavaScript(),如此处所示
  2. 您应该使用myPage.asText()或.asXml(),因为getWebResponse()返回原始内容而不执行JavaScript.

    String strURL = "https://www.checkmytrip.com" ;
    java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF);
    java.util.logging.Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.OFF);
    
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_31)) {
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    
        HtmlPage myPage = ((HtmlPage) webClient.getPage(strURL));
        webClient.waitForBackgroundJavaScript(10 * 1000);
    
        String theContent = myPage.asXml();
        System.out.println(theContent);
    }
    
    Run Code Online (Sandbox Code Playgroud)