将 ByteArray 转换为字符串并返回会产生不同的字符串

Question

将 ByteArray 转换为字符串并返回会产生不同的字符串

Ant*_*nko 0 java byte utf-8 character-encoding kotlin

我必须存储巨大的布尔值列表，我选择将它们存储为字节数组作为字符串。但我不明白，为什么转换为字符串并返回会产生不同的字符串值：

支持方式：

  fun ByteArray.string(): String {

    var str = ""

    this.reversed().forEach {
      str += intToString(it, 4)
    }

    return str
  }

  fun intToString(number: Byte, groupSize: Int): String {
    val result = StringBuilder()

    for (i in 7 downTo 0) {
      val mask = 1 shl i
      result.append(if (number.toInt() and mask != 0) "1" else "0")

      if (i % groupSize == 0)
        result.append(" ")
    }
    result.replace(result.length - 1, result.length, "")

    return result.toString()
  }

Run Code Online (Sandbox Code Playgroud)

第一个例子：

给定选定的索引 [0, 14]，我的代码转换为：作为字节：[1, 64]。.string()产生：

0100 0000 0000 0001

将其转换为字符串并返回：

array.toString(Charsets.UTF_8).toByteArray(Charsets.UTF_8)

Run Code Online (Sandbox Code Playgroud)

结果：[1, 64]，.string()产生：

0100 0000 0000 0001

第二个例子：

给定选定的索引 [0, 15]，我的代码转换为：作为字节：[1,-128]。.string()产生：

1000 0000 0000 0001

这似乎很合法。现在将其转换为字符串并返回

它产生一个 4 个字节的数组：[1, -17, -65, -67]，.string()产生：

1011 1101 1011 1111 1110 1111 0000 0001

对我来说，这看起来不像 [0, 15] 索引或 [1,-128] :)

这怎么会发生？我怀疑“1000 0000 0000 0001”中的最后一个“1”，可能会导致这个问题，但我仍然不知道答案。

谢谢。

PSjava为问题添加了标签，因为我认为 kotlin 和 java 的答案是相同的。

Answer 1

tha*_*guy 5

这是针对您的问题的 MCVE（在 Java 中）：

import java.nio.charset.*;

class Test {
  public static void main(String[] args) {
    byte[] array = { -128 };
    byte[] convertedArray = new String(array, StandardCharsets.UTF_8).getBytes(StandardCharsets.UTF_8);
    for(int i=0; i<convertedArray.length; i++) {
      System.out.println(convertedArray[i]);
    }
  }
}

Run Code Online (Sandbox Code Playgroud)

预期输出：

-128

Run Code Online (Sandbox Code Playgroud)

实际输出：

-17
-65
-67

Run Code Online (Sandbox Code Playgroud)

发生这种情况是因为该字节-128不是有效的 UTF-8 字符，因此它被替换为 Unicode 替换字符 U+FFFD“?”。

您可以将字符串编码和解码为 ISO-8859-1 aka Latin1，因为所有字节字符串在 ISO-8859 编码系列中都是有效的。ISO-8859-1 具有方便的特性，即每个字节值直接对应相同的 unicode 代码点，因此0x80编码为 U+0080、0xFFU+00FF 等。

归档时间：	8 年前
查看次数：	1668 次
最近记录：	8 年前