如何将本地html文件加载到Jsoup中？

Question

如何将本地html文件加载到Jsoup中？

我似乎无法使用Jsoup库加载本地html文件.或者至少它似乎没有认识到它.我在本地文件中硬编码了确切的html(作为var'html'),当我切换到那个而不是文件输入时,代码完美地工作.但是这两个文件都被读取了.

import java.io.File;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class FileHtmlParser{

public String input;


//constructor
public FileHtmlParser(String inputFile){input = inputFile;}


//methods
public FileHtmlParser execute(){

    File file = new File(input);
    System.out.println("The file can be read: " + file.canRead());

    String html = "<html><head><title>First parse</title><meta>106</meta> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /></head>"
              + "<body><p>Parsed HTML into a doc.</p>" +
              "" +
              "<div id=\"navbar\">this is the div</div></body></html>";
            Document doc = Jsoup.parseBodyFragment(input);




    Elements content = doc.getElementsByTag("div");
    if(content.hasText()){System.out.println("result is " + content.outerHtml());}
    else System.out.println("nothing!");


    return this;
}

}/*endOfClass*/

Run Code Online (Sandbox Code Playgroud)

结果时间:
文档doc = Jsoup.parseBodyFragment(html)

The file can be read: true
result is <div id="navbar">
this is the div
</div>

Run Code Online (Sandbox Code Playgroud)

结果时间:
文档doc = Jsoup.parseBodyFragment(输入)

The file can be read: true
nothing!

Run Code Online (Sandbox Code Playgroud)

Answer 1

hol*_*eek 11

你的错误在于假设Jsoup.parseBodyFragment()你知道你是否传递了包含html标记的文件名或包含html标记的字符串.

Jsoup.parseBodyFragment(input)期望这input是一个String包含html标记,而不是文件名.

要求它从文件解析使用该Jsoup.parse(File in, String charsetName)方法:

File in = new File(input);
Document doc = Jsoup.parse(in, null);

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，3 月前
查看次数：	12932 次
最近记录：	8 年，2 月前