HashMap破坏编码?

lot*_*eck 3 java encoding utf-8

我承认我不是一个真正的编码东西的专家等.我有以下问题:我的程序必须读取一个文本文件,其中不仅包含std.ASCII,但"特殊字符和语言" "..?????????? ?????? ????????.."如此让我们假设这是文件的内容:?????????? ?????? ????????

现在我想用单个单词拆分整个文件内容并创建另一个文件,列出所有这些单词,如:

  • ??????????
  • ??????
  • ????????

我的问题是:如果我将这些单词放入HashMap并从中读取值 - >编码就会丢失.这是我的代码:

    final StringBuffer fileData = new StringBuffer(1000);
    final BufferedReader reader = new BufferedReader(
            new FileReader("fileIn.txt"));

    char[] buf = new char[1024];
    int numRead = 0;
    while ((numRead = reader.read(buf)) != -1)
    {
        final String readData = String.valueOf(buf, 0, numRead);
        fileData.append(readData);
        buf = new char[1024];
    }
    reader.close();
    String mergedContent = fileData.toString();


    mergedContent = mergedContent.replaceAll("\\<.*?>", " ");
    mergedContent = mergedContent.replaceAll("\\r\\n|\\r|\\n", " ");

    final BufferedWriter out = new BufferedWriter(
            new OutputStreamWriter(
                    new FileOutputStream("fileOut.txt")));

    final HashMap<String, String> wordsMap = new HashMap<String, String>();

    final String test[] = mergedContent.split(" ");


    for (final String string : test)
    {

        wordsMap.put(string, string);
    }

    for (final String string : wordsMap.values())
    {
        out.write(string + "\n");
    }


    out.close();
Run Code Online (Sandbox Code Playgroud)

这个片段破坏了编码.有趣的是:如果我没有将值放入HashMap,而是将它们立即存储到输出文件中,如:

...
        for (final String string : test)
        {
                        out.write(string + "\n");
            //wordsMap.put(string, string);
        }

        //for (final String string : wordsMap.values())
        //{
        //  out.write(string + "\n");
        //}


        out.close();
Run Code Online (Sandbox Code Playgroud)

...然后它像我期望的那样工作.

我做错了什么?

Boz*_*zho 9

尝试使用new InputStreamReader(new FileInputStream(file), "UTF-8")输出然后相同的东西.并确保您的文件以UTF-8编码

hashmap不可能对编码做任何事情.