将整个html文件读取到String？

Question

将整个html文件读取到String？

是否有更好的方法将整个html文件读取到单个字符串变量:

    String content = "";
    try {
        BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
        String str;
        while ((str = in.readLine()) != null) {
            content +=str;
        }
        in.close();
    } catch (IOException e) {
    }

Run Code Online (Sandbox Code Playgroud)

Answer 1

Joh*_*erg 26

IOUtils.toString(..)Apache Commons提供了实用程序.

如果你正在使用Guava那里也Files.readLines(..)和Files.toString(..).

第一个链接失效了 (2认同)

Answer 2

Jea*_*art 24

你应该使用StringBuilder:

StringBuilder contentBuilder = new StringBuilder();
try {
    BufferedReader in = new BufferedReader(new FileReader("mypage.html"));
    String str;
    while ((str = in.readLine()) != null) {
        contentBuilder.append(str);
    }
    in.close();
} catch (IOException e) {
}
String content = contentBuilder.toString();

Run Code Online (Sandbox Code Playgroud)

Answer 3

SAb*_*deh 5

你可以使用JSoup.
这HTML parser对java来说非常强大

Answer 4

Kat*_*Kat 5

正如 Jean 提到的，使用 aStringBuilder代替+=会更好。但如果您正在寻找更简单的东西，Guava、IOUtils 和 Jsoup 都是不错的选择。

以番石榴为例：

String content = Files.asCharSource(new File("/path/to/mypage.html"), StandardCharsets.UTF_8).read();

Run Code Online (Sandbox Code Playgroud)

IOUtils 示例：

InputStream in = new URL("/path/to/mypage.html").openStream();
String content;

try {
   content = IOUtils.toString(in, StandardCharsets.UTF_8);
 } finally {
   IOUtils.closeQuietly(in);
 }

Run Code Online (Sandbox Code Playgroud)

Jsoup 示例：

String content = Jsoup.parse(new File("/path/to/mypage.html"), "UTF-8").toString();

Run Code Online (Sandbox Code Playgroud)

或者

String content = Jsoup.parse(new File("/path/to/mypage.html"), "UTF-8").outerHtml();

Run Code Online (Sandbox Code Playgroud)

笔记：

Files.readLines()和Files.toString()

自 Guava 发布版本 22.0（2017 年 5 月 22 日）起，这些现已弃用。 应如上例所示Files.asCharSource()使用。（22.0版本发布差异）

IOUtils.toString(InputStream)和Charsets.UTF_8

自 Apache Commons-IO 版本 2.5（2016 年 5 月 6 日）起已弃用。IOUtils.toString现在应该传递 theInputStream 和the，Charset如上例所示。StandardCharsets应使用Java 7，而不是Charsets 如上例所示。（已弃用 Charsets.UTF_8）

归档时间：	13 年，2 月前
查看次数：	99303 次
最近记录：	6 年，11 月前