ian*_*ell 4 java url uri httpclient mediawiki-api
"长时间读者,第一次发布海报".
我正在为我管理的西班牙语Wiki 制作机器人.我想从头开始,因为我的目的之一是练习Java.但是,在尝试使用HttpClient向包含非ASCII字符(如á,é,í,ó或ú)的URI进行GET请求时遇到了一些麻烦.
String url = "http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras de las Botas"
method = new GetMethod(url);
client.executeMethod(method);
Run Code Online (Sandbox Code Playgroud)
当我执行上述操作时,GetMethod会抱怨URI:
Exception in thread "main" java.lang.IllegalArgumentException: Invalid uri 'http://es.pruebaloca.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categoría:Mejoras%20de%20las%20Botas&cmlimit=500&format=xml': Invalid query
at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:222)
at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:89)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:69)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:120)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Run Code Online (Sandbox Code Playgroud)
请注意,在堆栈跟踪中显示的URI中,空格被编码为%20,并且ís保持原样.完全相同的URI在浏览器上完美运行,但我无法接受GetMethod接受它.
我也尝试过以下方法:
URI uri = new URI(url, false);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
Run Code Online (Sandbox Code Playgroud)
这样,URI逃过了is,但双重逃过了空间(%2520)......
http://es.metroid.wikia.com/api.php?action=query&list=categorymembers&cmtitle=Categor%C3%ADa:Mejoras%2520de%2520las%2520Botas&cmlimit=500&format=xml
Run Code Online (Sandbox Code Playgroud)
现在,如果我在查询中不使用任何空格,则没有双重转义,我得到所需的输出.因此,如果没有任何非ASCII字符的可能性,我不需要使用URI该类,也不会获得双重转义.为了避免第一次逃离空间,我尝试了这个:
URI uri = new URI(url, true);
method = new GetMethod(uri.getEscapedURI());
client.executeMethod(method);
Run Code Online (Sandbox Code Playgroud)
但URI班级不喜欢它:
org.apache.commons.httpclient.URIException: Invalid query
at org.apache.commons.httpclient.URI.parseUriReference(URI.java:2049)
at org.apache.commons.httpclient.URI.<init>(URI.java:167)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:66)
at net.metroidover.categorybot.http.HttpRequest.request(HttpRequest.java:121)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:38)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at net.metroidover.categorybot.http.Action.getCategoryMembers(Action.java:39)
at net.metroidover.categorybot.bot.BotComponent.<init>(BotComponent.java:58)
at net.metroidover.categorybot.bot.BotComponent.main(BotComponent.java:80)
Run Code Online (Sandbox Code Playgroud)
任何关于如何避免这种双重逃逸的输入将不胜感激.我四处潜伏,绝对没有运气.
谢谢!
编辑:最适合我的解决方案是帕西法尔的一个,但是,作为一个另外,我想说的是设置与路径method.setPath(url)做出HttpMethod拒绝一个cookie,我需要保存:
Aug 26, 2011 4:07:08 PM org.apache.commons.httpclient.HttpMethodBase processCookieHeaders
WARNING: Cookie rejected: "wikicities_session=900beded4191ff880e09944c7c0aaf5a". Illegal path attribute "/". Path of origin: "http://es.metroid.wikia.com/api.php"
Run Code Online (Sandbox Code Playgroud)
但是,如果我将URI发送给构造函数而忘记了setPath(url),那么cookie会毫无问题地保存.
String url = "http://es.metroid.wikia.com/api.php";
NameValuePair[] query = { new NameValuePair("action", "query"), new NameValuePair("list", "categorymembers"),
new NameValuePair("cmtitle", "Categoría:Mejoras de las Botas"), new NameValuePair("cmlimit", "500"),
new NameValuePair("format", "xml") };
HttpMethod method = null;
...
method = new GetMethod(url); // Or PostMethod(url)
method.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY); // It had been like this the whole time
method.setQueryString(query);
client.executeMethod(method);
Run Code Online (Sandbox Code Playgroud)
我建议使用UrlEncoder编码您的queryString值(而不是整个queryString).
UrlEncoder.encode("Categoría:Mejoras de las Botas", "UTF-8");
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
9809 次 |
| 最近记录: |