保存哈夫曼代码时出现问题?

zah*_*tTZ 2 c c++ huffman-code

我想将霍夫曼代码保存到文件中。我怎样才能做到这一点?我将霍夫曼代码保存到字符串中,但生成的文件的大小比原始文件大。

650*_*502 5

一种非常简单的方法是一次写一点,如下所示:

unsigned char acc; // Accumulator of bit waiting to be written
int bitcount;      // How many bits are aready present in the accumulator

// write a single bit (0/1)
void writebit(int bit)
{
    acc |= (bit << bitcount);
    if (++bitcount == 8)
    {
        writebyte(acc);
        acc = 0;
        bitcount = 0;
    }
}
Run Code Online (Sandbox Code Playgroud)

读回一位的过程是对称的

unsigned char acc;   // bits waiting to be extracted
int bitcount;        // how many bits are still available in acc

int readbit()
{
   if (bitcount == 0)
   {
       bitcount = 8;
       acc = readbyte();
   }
   --bitcount;
   return (acc >> (7 - bitcount)) & 1;
}
Run Code Online (Sandbox Code Playgroud)

当然,这只是最简单的方法,但我会等到您第一次能够正确保存和加载编码数据后再担心代码速度。

例子:

假设您有以下霍夫曼编码符号

A - 0
B - 10
C - 110
D - 111
Run Code Online (Sandbox Code Playgroud)

并且您想要对序列进行编码

A B A A C D A D B B
Run Code Online (Sandbox Code Playgroud)

然后你会按顺序打电话

writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(0);                           // A
writebit(0);                           // A
writebit(1); writebit(1); writebit(0); // C
writebit(1); writebit(1); writebit(1); // D
writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(1); writebit(0);              // B
Run Code Online (Sandbox Code Playgroud)

因此,实际写入的字节为

(01100010) = 0x62
(01010111) = 0x57
Run Code Online (Sandbox Code Playgroud)

(请注意,显示的代码从最低有效位开始,即如果您想识别符号,您应该从右到左读取括号内的位序列)。