Jena导致UTF-8格式

joh*_*ohn 2 java utf-8 sparql jena

如何以UTF-8格式获得Jena(Java语言)结果?我的代码:

Query query= QueryFactory.create(queryString);
QueryExecution qexec= QueryExecutionFactory.sparqlService("http://lod.openlinksw.com/sparql", queryString);
ResultSet results = qexec.execSelect();
List<QuerySolution> list = ResultSetFormatter.toList(results);  
System.out.println(list.get(i).get("churchname"));
Run Code Online (Sandbox Code Playgroud)

use*_*512 5

我假设这与SPARQL中的UTF-8格式有关

看了之后发生了什么:

  • 进口商采用utf-8编码的输入'Chodovskátvrz'.
  • 在utf-8中:'43 68 6f 64 6f 76 73 6b c3 a1 20 74 76 72 7a'(c3 a1在utf-8中是'á')
  • 导入器将这些字节读取为unicode字符.
  • 因此,不是'á'而是获得两个字符c3 a1,即'Ã'和'¡'.

你可以通过将字符串的字符转换为字节数组,然后从中生成一个新字符串来反转它.我敢肯定必须有一个更简单的方法,但这是一个例子:

public class Convert
{
    public static void main(String... args) throws Exception {
        String in = "Chodovsk\u00C3\u00A1 tvrz";
        char[] chars = in.toCharArray();
        // make a new string by treating chars as bytes
        String out = new String(fix(chars), "utf-8");
        System.err.println("Got: " + out); // Chodovská tvrz
    }

    public static byte[] fix(char[] a) {
        byte[] b = new byte[a.length];
        for (int i = 0; i < a.length; i++) b[i] = (byte) a[i];
        return b;
    }
}
Run Code Online (Sandbox Code Playgroud)

使用它list.get(i).get("churchname").toString()(这是你正在打印的)将修复这些名称.

编辑:

或者只是使用:

String churchname = list.get(i).get("churchname").toString();
String out2 = new String(churchname.getBytes("iso-8859-1"), "utf-8");
Run Code Online (Sandbox Code Playgroud)

哪个更简单.