尝试使用Java在html文件中读取和写入时出现编码错误

bra*_*age 0 html java encoding

我正在尝试从html文件中读取一些文本,以特定方式修改它并将结果写入新的html文件中.但问题是文本不是用英文写的,因此有些字符被黑白替换为"?" 分数.在我的html文件中,我有 < meta http-equiv="Content-Type" content="text/html; charset=utf-8">.我究竟做错了什么?也许不是正确的读者和作家?

StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(new FileReader("inputFile.html"));
String line;
while ( (line = br.readLine()) != null) {
     sb.append(line);
}
String result = doSomeChanges(sb);
BufferedWriter out = new BufferedWriter(new FileWriter("outputFile.html")); 
out.write(result); 
out.close(); 
Run Code Online (Sandbox Code Playgroud)

Mic*_*rdt 5

也许不是正确的读者和作家?

究竟.FileReader并且FileWriter是垃圾; 忘了他们存在.它们隐式使用平台默认编码,不允许您覆盖此默认值.

相反,使用这个:

BufferedReader br = new BufferedReader(
    new InputStreamReader(new FileInputStream("inputFile.html"), "UTF-8"));

BufferedWriter out = new BufferedWriter(
    new OutputStreamWriter(new FileOutputStream("outputFile.html"), "UTF-8"));
Run Code Online (Sandbox Code Playgroud)