霍夫曼树:穿越

Wok*_*sin 1 c# recursion traversal huffman-code

我不确定我将如何攻击我的霍夫曼树的穿越.树是正确的,我只是很难确定如何以一种好的方式遍历它.出于某种原因,我的遍历方法没有结果......

更新:清理代码,使其更加面向对象

节点类:

public class Node
{
    public int frekvens; //Frequency
    public char tegn; //Symbol
    public Node venstre; //Left child
    public Node høyre; //Right child
    public string s; //result string
    public string resultat;
    public Node (char c) // Node constructor containing symbol.
    {
        frekvens = 1;
        tegn = c;
    }

    public Node (int f, Node venstre, Node høyre) // Node Constructor containing frequency and children
        {
            frekvens = f;
            this.venstre = venstre;
            this.høyre = høyre;
        }

    public Node (Node node) // Node constructor containing a node
        {
            frekvens = node.frekvens;
            tegn = node.tegn;
            this.venstre = venstre;
            this.høyre = høyre;
        }

    public void ØkMed1() // Inkrement frequency by one
    {
        frekvens = frekvens + 1;
    }


    public char getVenstreTegn ()
    {
        return venstre.tegn;
    }

    public char getHøyreTegn ()
    {
        return venstre.tegn;
    }

    public int getVenstreFrekvens ()
    {
        return venstre.frekvens;
    }

    public int getHøyreFrekvens ()
    {
        return høyre.frekvens;
    }

    public int getFrekvens()
    {
        return frekvens;
    }


    public bool ErTegn(char c)
    {
        if ( c == tegn)
        {
            return false;
        }
        else
        {
            return true;
        }
    }

    //Pretty sure this does not work as intended
    public string traverser (Node n) //Traverse the tree
    {
        if (n.tegn != '\0') //If the node containes a symbol --> a leaf
        {
            resultat += s;  
        }
        else
        {
            if (n.getVenstreTegn() == '\0') //If left child does not have a symbol
            {
                s += "0";
                traverser(n.venstre);
            }
            if (n.getHøyreTegn() == '\0') //If right child does not have a symbol
            {
                s += "1";
                traverser(n.høyre);
            }
        }
        return resultat;
    }
    public string Resultat() //Used priviously to check if i got the correct huffman tree
    {
        string resultat;
        resultat = "Tegn: " + Convert.ToString(tegn) +"  frekvens: " + Convert.ToString(frekvens) + "\n";
        return resultat;
    }
}
Run Code Online (Sandbox Code Playgroud)

Huffman_Tree课程:

public class Huffman_Tre
{
    string treString;
    List<Node> noder = new List<Node>();
    public Node rot;
    public void bygg (string input)
    {
        bool funnet; //Found
        char karakter; //character

        for (int i = 0; i < input.Length;i++) //Loops through string and sets character
            //with coresponding freqeuncy in the node list
        {   
            karakter = input[i];
            funnet = false; //default
            for (int j = 0; j< noder.Count; j++)
            {
                if (noder[j].ErTegn(karakter) == false) //if the character already exists
                {
                    noder[j].ØkMed1(); //inkrement frequency by one
                    funnet = true; 
                    break;
                }
            }
            if (!funnet) //if the character does not exist 
            {
                noder.Add(new Node(karakter)); //add the character to list
            }
        }
        //Sorting node list acending by frequency
        var sortertListe = noder.OrderBy(c => c.frekvens).ToList();

        noder = sortertListe; 

        do
        {
            noder.Add(new Node((noder[0].frekvens + noder[1].frekvens), noder[0],noder[1]));

            //Remove the leaf nodes
            noder.RemoveAt(0);
            noder.RemoveAt(0); 
        } while(noder.Count >= 2);

    }

    public Node getRot()
    {
        return rot;
    }

    public string visTre()
    {

        foreach (Node node in noder)
        {
            treString += node.Resultat();
        }
        return treString;
    }
    public bool erNull()
    {
        if (noder[0].tegn == '\0')
        {
            return true;
        }
        else
            return false;
    }
}
Run Code Online (Sandbox Code Playgroud)

主程序:

private void btnKomprimer_Click(object sender, System.Windows.RoutedEventArgs e)
    {
        string input; //The string input I want to compress
        input = txtInput.Text; //initialize input to text input
        input = input.ToLower(); 
        txtOutput.Text = "";

        Huffman_Tre tre = new Huffman_Tre();

        tre.bygg(input);

        Node rot = new Node(tre.getRot());

        txtOutput.Text += rot.traverser(rot);
    }
}
Run Code Online (Sandbox Code Playgroud)

Rvd*_*V79 9

由于我还有一点时间,我在玩C#6.0的同时制作了一张霍夫曼树的例子.它没有被优化(甚至没有优化!),但它作为一个例子工作得很好.它将帮助您了解可能出现的"挑战".因为我的英语远比我的斯堪的纳维亚知识好,所以我使用英文命名,我希望你不介意.

首先,让我们从保持频率的类开始.

public sealed class HuffmanFrequencyTable
{       
    #region Properties
    /// <summary>
    /// Holds the characters and their corresponding frequencies
    /// </summary>
    public Dictionary<char, int> FrequencyTable  { get; set; } = new Dictionary<char, int>();

    #endregion

    #region Methods

    /// <summary>
    /// Clears the internal frequency table
    /// </summary>
    public void Clear()
    {
        FrequencyTable?.Clear();            
    }

    /// <summary>
    /// Accepts and parses a new line (string) which is then 
    /// merged with the existing dictionary or frequency table
    /// </summary>
    /// <param name="line">The line to parse</param>
    public void Accept(string line)
    {
        if (!string.IsNullOrEmpty(line))
        {
            line.GroupBy(ch => ch).
                 ToDictionary(g => g.Key, g => g.Count()).
                 ToList().
                 ForEach(x => FrequencyTable[x.Key] = x.Value);
        }
    }

    /// <summary>
    /// Performs a dump of the frequency table, ordering all characters, lowest frequency first.
    /// </summary>
    /// <returns>The frequency table in the format 'character [frequency]'</returns>
    public override string ToString()
    {            
        return FrequencyTable?.PrintFrequencies();
    }
    #endregion
}
Run Code Online (Sandbox Code Playgroud)

请注意,ToString()方法使用一种扩展方法,该方法能够"转储"所使用的Dictionary的内容.扩展位于名为Helpers的静态类中,如下所示:

/// <summary>
/// Extension method that helps to write the contents of a generic Dictionary to a string, ordered by it's values and 
/// printing the key and it's value between brackets.
/// </summary>
/// <typeparam name="TKey">Generic key</typeparam>
/// <typeparam name="TValue">Generic value type</typeparam>
/// <param name="dictionary">The dictionary</param>
/// <exception cref="ArgumentNullException">Throws an argument null exception if the provided dictionary is null</exception>
/// <returns></returns>
public static string PrintFrequencies<TKey, TValue>(this IDictionary<TKey, TValue> dictionary)
{
    if (dictionary == null)
        throw new ArgumentNullException("dictionary");

    var items = from kvp in dictionary
                orderby kvp.Value
                select kvp.Key + " [" + kvp.Value + "]";

    return string.Join(Environment.NewLine, items);
}
Run Code Online (Sandbox Code Playgroud)

现在,有了这个FrequencyTable,我们就可以开始研究如何构建节点了.Huffman使用二叉树,因此最好生成一个具有左右子节点的Node类.我也冒昧地在这里执行遍历算法.该类构建如下:

public sealed class HuffmanNode
{
    #region Properties

    /// <summary>
    /// Holds the left node, if applicable, otherwise null
    /// </summary>
    public HuffmanNode Left { get; set; } = null;

    /// <summary>
    /// Holds the right node, if applicable, otherwise null
    /// </summary>
    public HuffmanNode Right { get; set; } = null;

    /// <summary>
    /// Holds the Character (or null) for this particular node
    /// </summary>
    public char? Character { get; set; } = null;

    /// <summary>
    /// Holds the frequency for this particular node, defaulted to 0
    /// </summary>
    public int Frequency { get; set; } = default(int);

    #endregion

    #region Methods

    /// <summary>
    /// Traverses all nodes recursively returning the binary 
    /// path for the corresponding character that has been found.
    /// </summary>
    /// <param name="character">The character to find</param>
    /// <param name="data">The datapath (containing '1's and '0's)</param>
    /// <returns>The complete binary path for a character within a node</returns>
    public List<bool> Traverse(char? character, List<bool> data)
    {
        //Check the leafs for existing characters
        if (null == Left && null == Right)
        {
            //We're at an endpoint of our 'tree', so return it's data or nothing when the symbol
            //characters do not match
            return (bool)character?.Equals(Character) ? data : null;
        }
        else
        {
            List<bool> left = null;
            List<bool> right = null;

            //TODO: If possible refactor with proper C# 6.0 features
            if (null != Left)
            {
                List<bool> leftPath = new List<bool>(data);                    
                leftPath.Add(false); //Add a '0'
                left = Left.Traverse(character, leftPath); //Recursive traversal for child nodes within this left node.
            }

            if (null != Right)
            {
                List<bool> rightPath = new List<bool>(data);                    
                rightPath.Add(true); //Add a '1'
                right = Right.Traverse(character, rightPath); //Recursive traversal for childnodes within this right node
            }

            return (null != left) ? left : right;
        }
    }        

    #endregion
}
Run Code Online (Sandbox Code Playgroud)

我在HuffmanTree类中使用Node类.从逻辑上讲,树是从节点构建的.相应的HuffmanTree是这样编写的:

public sealed class HuffmanTree
{
    #region Fields
    /// <summary>
    /// Field for keeping the Huffman nodes in. Internally used.
    /// </summary>
    private List<HuffmanNode> nodes = new List<HuffmanNode>();        
    #endregion

    #region Properties

    /// <summary>
    /// Holds the Huffman tree
    /// </summary>
    public HuffmanNode Root   {  get; set; } = null;

    /// <summary>
    /// Holds the frequency table for all parsed characters
    /// </summary>
    public HuffmanFrequencyTable Frequencies { get; private set; } = new HuffmanFrequencyTable()

    /// <summary>
    /// Holds the amount of bits after encoding the tree.
    /// Primary usable for decoding.
    /// </summary>
    public int BitCountForTree  { get;  private set;  } = default(int);

    #endregion

    #region Methods

    /// <summary>
    /// Builds the Huffman tree
    /// </summary>
    /// <param name="source">The source to build the Hufftree from</param>
    /// <exception cref="ArgumentNullException">Thrown when source is null or empty</exception>
    public void BuildTree(string source)
    {
        nodes.Clear(); //As we build a new tree, first make sure it's clean :)

        if (string.IsNullOrEmpty(source))
            throw new ArgumentNullException("source");
        else
        {
            Frequencies.Accept(source);

            foreach (KeyValuePair<char, int> symbol in Frequencies.FrequencyTable)
            {
                nodes.Add(new HuffmanNode() { Character = symbol.Key, Frequency = symbol.Value });
            }

            while (nodes.Count > 1)
            {
                List<HuffmanNode> orderedNodes = nodes.OrderBy(node => node.Frequency).ToList();

                if (orderedNodes.Count >= 2)
                {
                    List<HuffmanNode> takenNodes = orderedNodes.Take(2).ToList();

                    HuffmanNode parent = new HuffmanNode()
                    {
                        Character = null,
                        Frequency = takenNodes[0].Frequency + takenNodes[1].Frequency,
                        Left = takenNodes[0],
                        Right = takenNodes[1]
                    };

                    //Remove the childnodes from the original node list and add the new parent node
                    nodes.Remove(takenNodes[0]);
                    nodes.Remove(takenNodes[1]);
                    nodes.Add(parent);
                }
            }

            Root = nodes.FirstOrDefault();
        }
    }

    /// <summary>
    /// Encodes a given string to the corresponding huffman encoding path
    /// </summary>
    /// <param name="source">The source to encode</param>
    /// <returns>The binary huffman representation of the source</returns>
    public BitArray Encode(string source)
    {
        if (!string.IsNullOrEmpty(source))
        {
            List<bool> encodedSource = new List<bool>();
            //Traverse the tree for each character in the passed source (string) and add the binary path to the encoded source
            encodedSource.AddRange(source.SelectMany(character =>
                                        Root.Traverse(character, new List<bool>())
                                    ).ToList()
                                  );
            //For decoding, we might need the amount of bits to skip trailing bits.
            BitCountForTree = encodedSource.Count;
            return new BitArray(encodedSource.ToArray());
        }
        else return null;
    }

    /// <summary>
    /// Decodes a given binary path to represent it's string value
    /// </summary>
    /// <param name="bits">BitArray for traversing the tree</param>
    /// <returns></returns>
    public string Decode(BitArray bits)
    {
        HuffmanNode current = Root;
        string decodedString = string.Empty;

        foreach (bool bit in bits)
        {
            //Find the correct current node depending on the bit set or not set.
            current = (bit ? current.Right ?? current : current.Left ?? current);               

            if (current.IsLeaf())
            {
                decodedString += current.Character;
                current = Root;
            }
        }

        return decodedString;
    }

    #endregion
}
Run Code Online (Sandbox Code Playgroud)

这段代码中有趣的是,我决定使用BitArrays它来保存树的二进制路径.public BitArray Encode(string source)这里的方法包含脏黑客.我跟踪用于编码的总位数并将其存储在BitCountForTree属性中.执行解码时,我将使用此属性删除可能出现的任何尾随位.有一种更好的方式来执行此操作,但我会留下让您了解的方法.

此外,该类还使用为HuffmanNode编写的扩展方法.虽然这很简单:

  /// <summary>
  /// Determines whether a given Huffman node is a leaf or not.
  /// A node is considered to be a leaf when it has no childnodes
  /// </summary>
  /// <param name="node">A huffman node</param>
  /// <returns>True if no children are left, false otherwise</returns>
  public static bool IsLeaf(this HuffmanNode node)
  {
      return (null == node.Left && null == node.Right);
  }
Run Code Online (Sandbox Code Playgroud)

该扩展方法便于确定给定节点是否实际上是叶子节点.叶子是没有子节点的节点,因此是二叉树的末尾(或者更好的是该树的分支).

现在是有趣的部分,我如何让事情在这里工作.我已经构建了一个包含3个文本框的Windows窗体应用程序.一个用于实际输入,一个用于二进制(编码)输出,另一个用于显示压缩结果.我还放置了两个简单的按钮,一个用于执行霍夫曼编码,另一个用于霍夫曼解码.

霍夫曼编码方法编写如下(仅在编码按钮的事件处理程序中):

 string input = tbInput.Text;
 Tree.BuildTree(input); //Build the huffman tree

 BitArray encoded = Tree.Encode(input); //Encode the tree

 //First show the generated binary output
 tbBinaryOutput.Text = string.Join(string.Empty, encoded.Cast<bool>().Select(bit => bit ? "1" : "0"));

  //Next, convert the binary output to the new characterized output string.       
  byte[] bytes = new byte[(encoded.Length / 8) + 1];
  encoded.CopyTo(bytes, 0);

  tbOutput.Text = Encoding.Default.GetString(bytes); //Write the compressed output to the textbox.
Run Code Online (Sandbox Code Playgroud)

请注意,编码的二进制字符串没有任何尾随位.我将把它留给C#的编码机制.这样做的缺点是,我必须在解码时跟踪它.

解码现在也不太难.虽然,对于这个例子,我正在利用上面放置的编码代码生成的压缩输出.此外,我假设已经建立了霍夫曼树(它的频率表!!!).通常,频率表存储在压缩文件中,因此可以重建.

 //First convert the compressed output to a bit array again again and skip trailing bits.            
 bool[] boolAr = new BitArray(Encoding.Default.GetBytes(tbOutput.Text)).Cast<bool>().Take(Tree.BitCountForTree).ToArray();
 BitArray encoded = new BitArray( boolAr );

 string decoded = Tree.Decode(encoded);
 MessageBox.Show(decoded, "Decoded result: ", MessageBoxButtons.OK, MessageBoxIcon.Information);
Run Code Online (Sandbox Code Playgroud)

请注意我创建的脏黑客,因为Encoding.Default.GetBytes(tbOutput.Text)肯定会生成一个字节数组,它可能包含不需要解码的尾随位.因此,我只根据重建树获取实际需要的位数.

所以在运行时,我的例子提供了以下输出,当使用'世界名声'时,"快速的棕色狐狸跳过懒惰的程序员":

按"Huff encode"按钮后:

HuffEncode

按下"Huff decode"按钮后:

HuffDecode

现在这段代码可以真正使用一些优化,因为您可能会考虑使用Arrays而不是Dictionaries.还有更多,但这取决于你的考虑.