为什么法语字符不能在 Java 中使用 utf-8？

Question

为什么法语字符不能在 Java 中使用 utf-8？

Mel*_*ier 5 html java file-io character utf-8

我有一个 HTML 文件，其中包含一些法语字符。我需要替换该文件中的一些字符串，因此我执行以下操作：

\n\n

public static void replaceStringInFile(String filePath, String oldText, String newText)\n{\n    try\n    {\n        Path path = Paths.get(filePath);\n        Charset charset = StandardCharsets.UTF_8;\n        String content = new String(Files.readAllBytes(path), charset);\n        content = content.replace(oldText, newText);\n        Files.write(path, content.getBytes(charset));\n    }\n    catch(Exception e)\n    {\n        e.printStackTrace();\n    }\n}\n

Run Code Online (Sandbox Code Playgroud)\n\n

我的字符串被替换，但法语字符不存在，替换为 \xc3\xaf\xc2\xbf\xc2\xbd

\n\n

如果我用 ISO_8859_1 替换 UTF_8，它就可以工作。

\n\n

我以为UTF_8是通用的？应该用法语工作吗？我尝试在 html 文件头中指定 utf-8：

\n\n

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">\n<html>\n<head>\n<meta charset="utf-8"/>\n....\n</style>\n

Run Code Online (Sandbox Code Playgroud)\n\n

\n\n

我想了解为什么 UTF_8 不保留我的法语字符......

\n

Answer 1

Tom*_*get 6

在阅读文本文件之前，您必须知道它的编码。显然，它最初是一个没有元字符集的 HTML 文件。

\n\n

你猜对了 UTF-8。它不是 UTF-8，因为读取它时检测到不对应于 UTF-8 的字节，因此被替换为 Unicode 替换字符U+FFFD \xef\xbf\xbd，然后您将显示该字符（？）使用不正确的编码，将 \xef\xbf\xbd 转换为 Mojibake“\xc3\xaf\xc2\xbf\xc2\xbd”。

\n\n

因此，您必须返回发送者/写入者以找出编码是什么。然后你可以编写一个程序来读取它。

\n

如果你想继续猜测，ISO 8859-1、Windows-1252 和 CP850 可能是不错的选择。 (3认同)

归档时间：	7 年，9 月前
查看次数：	15255 次
最近记录：	7 年前