我已将网页的HTML存储在数据库中.
我想利用HtmlUnit查找/引用DOM元素的能力.
是否可以从字符串(通过数据库列)加载HtmlPage对象?
一个html页面有分页链接,1页面设置在页面顶部,另一个页面位于页面底部.
使用HtmlUnit,我目前正在使用页面上的HtmlAnchor getByAnchorText("1");
顶部的一些链接存在问题,因此我想使用XPath引用底部链接.
nextPageAnchor = (HtmlAnchor) page.getByXPath("");
Run Code Online (Sandbox Code Playgroud)
如何使用xpath引用页面上的第二个链接?
我需要使用AnchorText引用链接,所以链接如下:
<a href="....">33</a>
Run Code Online (Sandbox Code Playgroud)
href有随机文本,是一个javascript函数,所以我不知道它会是什么.
xpath有可能吗?
我正在玩一个带有上下文菜单的grails应用程序(右键单击).上下文菜单是使用Chris Domigan的jquery contextmenu插件构建的.
虽然上下文实际上有效,但我想进行自动化测试,而我无法确定如何做到这一点.
我想只选择节点内的原子值.例如,以下"here"文本:
<a href="">here</a>
Run Code Online (Sandbox Code Playgroud)
当我在Java中使用Xpath时,它会返回某种对象/数组,例如
[DomNode[<a href="">here</a>]]
Run Code Online (Sandbox Code Playgroud)
我只想要文本.
这有可能,怎么样?谢谢!
我正在尝试更多地了解HTMLunit并进行一些测试.我正在尝试从此站点获取页面标题和文本等基本信息:
https://....com(删除了完整的网址,重要的是它是https)
我使用的代码就是这个,在其他网站上运行正常:
final WebClient webClient = new WebClient();
final HtmlPage page;
page = (HtmlPage)webClient.getPage("https://medeczane.sgk.gov.tr/eczane/login.jsp");
System.out.println(page.getTitleText());
System.out.println(page.asText());
Run Code Online (Sandbox Code Playgroud)
为什么我不能获得这些基本信息?如果是因为安全措施,具体是什么,我可以绕过它们吗?谢谢.
编辑:嗯,代码在webclient.getpage()之后停止工作; ,test2没有写.所以我无法检查页面是否为空.
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_2);
final HtmlPage page;
System.out.println("test1");
try {
page = (HtmlPage)webClient.getPage("https://medeczane.sgk.gov.tr/eczane/login.jsp");
System.out.println("test2");
Run Code Online (Sandbox Code Playgroud) 我正在尝试搜索亚马逊.我想选择类别,例如.书籍,键入一些搜索条件,例如.java并单击Go按钮.我的问题是单击Go按钮.我有例外:
线程"main"中的异常java.lang.IndexOutOfBoundsException:索引:0,大小:0,java.util.ArrayList.rangeCheck(ArrayList.java:571),位于java.util.ArrayList.get(ArrayList.java:349)at Bot.main中的Bot.clickSubmitButton(Bot.java:77)(Bot.java:111)
这是我的代码:
/**
* @author ivan.bisevac
*/
import java.io.IOException;
import java.net.MalformedURLException;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlImageInput;
import com.gargoylesoftware.htmlunit.html.HtmlInput;
import com.gargoylesoftware.htmlunit.html.HtmlOption;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.html.HtmlSelect;
import com.gargoylesoftware.htmlunit.html.HtmlSubmitInput;
public class Bot {
private HtmlPage currentPage;
public HtmlPage getCurrentPage() {
return currentPage;
}
public Bot() {
}
/**
* Bot constructor
*
* @param pageAddress
* Address to go.
* @throws IOException
* @throws MalformedURLException
* @throws FailingHttpStatusCodeException
*/
public Bot(String pageAddress) throws FailingHttpStatusCodeException,
MalformedURLException, IOException {
this();
this.goToAddress(pageAddress);
} …Run Code Online (Sandbox Code Playgroud) 我正在尝试获取给定页面的Web状态.但是当它出现404错误时,页面不会返回状态代码,而是抛出错误.
int status= webClient.getPage("website").getWebResponse().getStatusCode();
System.out.println( status);
Run Code Online (Sandbox Code Playgroud)
有任何想法吗?
我希望看到网站何时超时,但是出于测试目的,我错误地想要网站的网址,看看我是否能看到404.
我无法理解此HTMLUnit异常的含义.当我在网页上的链接上调用click()时会发生这种情况.
Exception class=[net.sourceforge.htmlunit.corejs.javascript.WrappedException]
com.gargoylesoftware.htmlunit.ScriptException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "offsetWidth" from null (http://webapps6.doc.state.nc.us/opi/scripts/DHTMLmessages.js#95) (javascript url#297)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:534)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:432)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:407)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:965)
at com.gargoylesoftware.htmlunit.html.HtmlAnchor.doClickAction(HtmlAnchor.java:87)
at com.gargoylesoftware.htmlunit.html.HtmlAnchor.doClickAction(HtmlAnchor.java:121)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1329)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1288)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1257)
at testapp.TestApp.main(TestApp.java:61)
Caused by: net.sourceforge.htmlunit.corejs.javascript.WrappedException: Wrapped com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot read property "offsetWidth" from null (http://webapps6.doc.state.nc.us.js#95) (javascript url#297)
at net.sourceforge.htmlunit.corejs.javascript.Context.throwAsScriptRuntimeEx(Context.java:1802)
at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:196)
at net.sourceforge.htmlunit.corejs.javascript.FunctionObject.call(FunctionObject.java:479)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1701)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:854)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:164)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:429)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:267)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3183)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:175)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$5.doRun(JavaScriptEngine.java:423)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:528)
... …Run Code Online (Sandbox Code Playgroud) 我正在尝试从单击一个链接的__doPostBack函数的ASP页中抓取数据。当我单击()具有HTMLUnit的链接时,它将返回我从其开始的页面。我需要怎么做才能完成回发并返回下一页?
码:
import java.util.List;
import com.gargoylesoftware.htmlunit.ScriptResult;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class ScrapperApp {
private static void go() throws Exception {
/* turn off annoying htmlunit warnings */
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
HtmlPage nextPage;
ScriptResult onClick;
String url = "http://media.ethics.ga.gov/search/Campaign/Campaign_Name.aspx?NameID=5751&FilerID=C2009000085&Type=candidate";
final WebClient webclient = new WebClient(BrowserVersion.CHROME_16);
final HtmlPage page = webclient.getPage(url);
System.out.println("PULLING LINKS:");
List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//table[@id='ctl00_ContentPlaceHolder1_Name_Reports1_TabContainer1_TabPanel1_dgReports']/tbody/tr/td/a[@class='lblentrylink']");
for(int x=0; x<articles.size(); x++) {
System.out.println("Clicking "+x+": "+articles.get(x).asText());
nextPage = articles.get(x).click();
System.out.println(nextPage.getUrl());
}
}
public static void main(String[] args) throws Exception …Run Code Online (Sandbox Code Playgroud) 我正在编写一个Java程序来登录学校用来发布成绩的网站。
这是登录表单的网址:https : //ma-andover.myfollett.com/aspen/logon.do
这是登录表单的HTML:
<form name="logonForm" method="post" action="/aspen/logon.do" autocomplete="off"><div><input type="hidden" name="org.apache.struts.taglib.html.TOKEN" value="30883f4c7e25a014d0446b5251aebd9a"></div>
<input type="hidden" id="userEvent" name="userEvent" value="930">
<input type="hidden" id="userParam" name="userParam" value="">
<input type="hidden" id="operationId" name="operationId" value="">
<input type="hidden" id="deploymentId" name="deploymentId" value="ma-andover">
<input type="hidden" id="scrollX" name="scrollX" value="0">
<input type="hidden" id="scrollY" name="scrollY" value="0">
<input type="hidden" id="formFocusField" name="formFocusField" value="username">
<input type="hidden" name="mobile" value="false">
<input type="hidden" name="SSOLoginDone" value="">
<center>
<img src="images/spacer.gif" height="15" width="1">
<script language="JavaScript">
document.forms[0].elements['deploymentId'].value = 'ma-andover';
</script>
<script language="JavaScript">
$(function()
{
$('form').attr('autocomplete', 'off');
var name = $('#username');
var password …Run Code Online (Sandbox Code Playgroud)