小编Hih*_*tje的帖子

JSoup字符编码问题

我正在使用JSoup来解析来自http://www.latijnengrieks.com/vertaling.php?id=5368的内容.这是第三方网站,未指定正确的编码.我使用以下代码加载数据:

public class Loader {

    public static void main(String[] args){
        String url = "http://www.latijnengrieks.com/vertaling.php?id=5368";

        Document doc;
        try {

            doc = Jsoup.connect(url).timeout(5000).get();
            Element content = doc.select("div.kader").first();
            Element contenttableElement = content.getElementsByClass("kopje").first().parent().parent();

            String contenttext = content.html();
            String tabletext = contenttableElement.html();

            contenttext = Jsoup.parse(contenttext).text();
            contenttext = contenttext.replace("br2n", "\n");
            tabletext = Jsoup.parse(tabletext.replaceAll("(?i)<br[^>]*>", "br2n")).text();
            tabletext = tabletext.replace("br2n", "\n");

            String text = contenttext.substring(tabletext.length(), contenttext.length());
            System.out.println(text);


        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }


    }    

}
Run Code Online (Sandbox Code Playgroud)

这给出了以下输出:

Aeneas dwaalt rond in Troje …
Run Code Online (Sandbox Code Playgroud)

java jsoup

21
推荐指数
3
解决办法
4万
查看次数

标签 统计

java ×1

jsoup ×1