我正在使用Java来解析这个请求
结果这个(为了简洁而截断)JSON文件:
{"responseData":{"results":
<...>
"visibleUrl":"www.coolcook.net",
"cacheUrl":"http://www.google.com/search?q\u003dcache:p4Ke5q6zpnUJ:www.coolcook.net",
"title":"???? ????? - ???? ?????? ??????? ????? ?????",
"titleNoFormatting":"???? ????? - ???? ?????? ??????? ????? ?????","\u003drz+img+news+recordid+border"}},
<...>
"responseDetails": null, "responseStatus": 200}
Run Code Online (Sandbox Code Playgroud)
我的问题在于返回的阿拉伯字符(可能是任何非unicode).我尝试使用以下方法将它们转换回unicode:
JSONArray ja = json.getJSONObject("responseData").getJSONArray("results");
JSONObject j = ja.getJSONObject(i);
str = j.getString("titleNoFormatting");
logger.log("before: " + str); // this is just my version of println
enc_str = new String (str.getBytes(), "UTF8");
logger.log("after: " + enc_str);
Run Code Online (Sandbox Code Playgroud)
但是,'before'和'after'结果都是相同的:一组????,无论我是在服务器日志文件中还是在HTML页面中输出它们.还有另一种方法可以取回阿拉伯字符并将其输出到网页中吗?
JSON是否具有针对此类问题的任何支持功能,可能是为了直接从JSONObject读取非utf字符?
您遇到的问题很可能是由于您在谷歌的http响应中读取的字符编码设置不正确造成的.你可以发布实际获取URL的代码并将其解析为JSON对象吗?
举个例子,运行以下代码:
public class Test1 {
public static void main(String [] args) throws Exception {
// just testing that the console can output the correct chars
System.out.println("\"title\":\"???? ????? - ???? ?????? ??????? ????? ?????");
URL url = new URL("http://ajax.googleapis.com/ajax/services/search/web?start=0&rsz=large&v=1.0&q=rz+img+news+recordid+border");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
InputStream is = connection.getInputStream();
// the important bit is here..........................\/\/\/
InputStreamReader reader = new InputStreamReader(is, "utf-8");
StringWriter sw = new StringWriter();
char [] buffer = new char[1024 * 8];
int count ;
while( (count = reader.read(buffer)) != -1){
sw.write(buffer, 0, count);
}
System.out.println(sw.toString());
}
}
Run Code Online (Sandbox Code Playgroud)
这是使用URL.openConnection()自时间开始以来一直存在的相当丑陋的标准.如果你使用像Apache httpclient这样的东西,那么你可以很容易地做到这一点.
关于编码的一些背景阅读,也许解释为什么new String (str.getBytes(), "UTF8");永远不会工作阅读Joel关于unicode的文章
首先尝试这个:
str = j.getString("titleNoFormatting");
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("c:/test.txt"), "UTF-8"));
writer.write(str);
writer.close();
Run Code Online (Sandbox Code Playgroud)
然后在记事本中打开该文件。如果这看起来没问题,那么问题在于您的记录器或控制台未配置为使用UTF-8. 否则问题很可能在于您使用的 JSON API 未配置为使用UTF-8.
编辑:如果问题实际上出在使用的 JSON API 并且您不知道该选择哪个,那么我建议使用Gson。它确实简化了将 Json 字符串转换为易于使用的 javabean 的过程。这是一个基本示例:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.util.List;
import com.google.gson.Gson;
public class Test {
public static void main(String[] args) throws Exception {
URL url = new URL("http://ajax.googleapis.com/ajax/services/search/web"
+ "?start=0&rsz=large&v=1.0&q=rz+img+news+recordid+border");
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
GoogleResults results = new Gson().fromJson(reader, GoogleResults.class);
// Show all results.
System.out.println(results);
// Show title of 1st result (is arabic).
System.out.println(results.getResponseData().getResults().get(0).getTitle());
}
}
class GoogleResults {
ResponseData responseData;
public ResponseData getResponseData() { return responseData; }
public void setResponseData(ResponseData responseData) { this.responseData = responseData; }
public String toString() { return "ResponseData[" + responseData + "]"; }
static class ResponseData {
List<Result> results;
public List<Result> getResults() { return results; }
public void setResults(List<Result> results) { this.results = results; }
public String toString() { return "Results[" + results + "]"; }
}
static class Result {
private String url;
private String title;
public String getUrl() { return url; }
public String getTitle() { return title; }
public void setUrl(String url) { this.url = url; }
public void setTitle(String title) { this.title = title; }
public String toString() { return "Result[url:" + url +",title:" + title + "]"; }
}
}
Run Code Online (Sandbox Code Playgroud)
它很好地输出结果。希望这可以帮助。
| 归档时间: |
|
| 查看次数: |
26001 次 |
| 最近记录: |