Tob*_*ann 8 compression huffman-code
我正在尝试有效地为一组给定字符构造一个二进制后缀代码及其概率(即一组单词,其中没有一个是任何其他单词的后缀).
我的基本思想是使用霍夫曼算法的实现来构造前缀代码.通过反转代码字,我得到一个无后缀的代码.虽然这个解决方案正在工作,但它似乎不是最佳的,因为我必须反转可变长度的代码字(因此我需要一个结合了位移的查找表).
有没有办法修改霍夫曼算法,以便更有效地创建后缀代码?
我将 HuffmanNode 实现为
class HuffmanNode implements Comparable<HuffmanNode>
{
// data
private String text;
private double frequency;
// linkage
private HuffmanNode left;
private HuffmanNode right;
private HuffmanNode parent;
public HuffmanNode(String text, double frequency)
{
this.text = text;
this.frequency = frequency;
}
public HuffmanNode(HuffmanNode n0, HuffmanNode n1)
{
if(n0.frequency < n1.frequency)
{
left = n0;
right = n1;
}else if(n0.frequency > n1.frequency)
{
left = n1;
right = n0;
}else
{
if(n0.text.compareTo(n1.text) < 0)
{
left = n0;
right = n1;
}else
{
left = n1;
right = n0;
}
}
left.parent = this;
right.parent = this;
text = left.text + right.text;
frequency = left.frequency + right.frequency;
}
public HuffmanNode getParent() {
return parent;
}
public HuffmanNode getLeft() {
return left;
}
public HuffmanNode getRight() {
return right;
}
public String getText()
{
return text;
}
@Override
public int compareTo(HuffmanNode o) {
if(frequency < o.frequency)
return -1;
else if(frequency > o.frequency)
return 1;
else
return text.compareTo(o.text);
}
public Collection<HuffmanNode> leaves()
{
if(left == null && right == null)
{
Set<HuffmanNode> retval = new HashSet<>();
retval.add(this);
return retval;
}
else if(left == null || right == null)
{
Set<HuffmanNode> retval = new HashSet<>();
if(left != null)
retval.addAll(left.leaves());
if(right != null)
retval.addAll(right.leaves());
retval.add(this);
return retval;
}
else
{
Set<HuffmanNode> retval = new HashSet<>();
retval.addAll(left.leaves());
retval.addAll(right.leaves());
return retval;
}
}
public String toString()
{
return "{" + text + " -> " + frequency + "}";
}
}
Run Code Online (Sandbox Code Playgroud)
此类表示霍夫曼树中的单个节点。
它具有从(子)树获取所有叶子的便捷方法。
然后您可以轻松构建树:
private Map<String,String> buildTree(String text)
{
List<HuffmanNode> nodes = new ArrayList<>();
for(Map.Entry<String,Double> en : frequency(text).entrySet())
{
nodes.add(new HuffmanNode(en.getKey(), en.getValue()));
}
java.util.Collections.sort(nodes);
while(nodes.size() != 1)
{
HuffmanNode n0 = nodes.get(0);
HuffmanNode n1 = nodes.get(1);
// build merged node
HuffmanNode newNode = new HuffmanNode(nodes.get(0), nodes.get(1));
nodes.remove(n0);
nodes.remove(n1);
// calculate insertion point
int insertionPoint = - java.util.Collections.binarySearch(nodes, newNode) - 1;
// insert
nodes.add(insertionPoint, newNode);
}
// build lookup table
Map<String, String> lookupTable = new HashMap<>();
for(HuffmanNode leaf : nodes.iterator().next().leaves())
{
String code = "";
HuffmanNode tmp = leaf;
while(tmp.getParent() != null)
{
if(tmp.getParent().getLeft() == tmp)
code = "0" + code;
else
code = "1" + code;
tmp = tmp.getParent();
}
lookupTable.put(leaf.getText(), code);
}
return lookupTable;
}
Run Code Online (Sandbox Code Playgroud)
通过更改构建代码的方法(例如预先放置下一个数字而不是附加它),您可以更改正在生成的代码。