moe*_*nad 17 python compression algorithm
我正在寻找一种方法来压缩基于ascii的字符串,任何帮助?
我还需要解压缩它.我试过zlib但没有帮助.
我该怎么做才能将字符串压缩成较短的长度?
码:
def compress(request):
if request.POST:
data = request.POST.get('input')
if is_ascii(data):
result = zlib.compress(data)
return render_to_response('index.html', {'result': result, 'input':data}, context_instance = RequestContext(request))
else:
result = "Error, the string is not ascii-based"
return render_to_response('index.html', {'result':result}, context_instance = RequestContext(request))
else:
return render_to_response('index.html', {}, context_instance = RequestContext(request))
Run Code Online (Sandbox Code Playgroud)
Rol*_*ith 26
使用压缩并不总是会减少字符串的长度!
考虑以下代码;
import zlib
import bz2
def comptest(s):
print 'original length:', len(s)
print 'zlib compressed length:', len(zlib.compress(s))
print 'bz2 compressed length:', len(bz2.compress(s))
Run Code Online (Sandbox Code Playgroud)
让我们尝试一下空字符串;
In [15]: comptest('')
original length: 0
zlib compressed length: 8
bz2 compressed length: 14
Run Code Online (Sandbox Code Playgroud)
因此zlib产生额外的8个字符和bz214.压缩方法通常在压缩数据前放置一个"标题"供解压缩程序使用.此标头会增加输出的长度.
我们来测试一个单词;
In [16]: comptest('test')
original length: 4
zlib compressed length: 12
bz2 compressed length: 40
Run Code Online (Sandbox Code Playgroud)
即使您减去标题的长度,压缩也不会使单词变短.那是因为在这种情况下压缩很少.字符串中的大多数字符只出现一次.现在是一个短句;
In [17]: comptest('This is a compression test of a short sentence.')
original length: 47
zlib compressed length: 52
bz2 compressed length: 73
Run Code Online (Sandbox Code Playgroud)
压缩输出再次大于输入文本.由于文本的长度有限,因此几乎没有重复,因此不能很好地压缩.
你需要一个相当长的文本块来压缩才能真正起作用;
In [22]: rings = '''
....: Three Rings for the Elven-kings under the sky,
....: Seven for the Dwarf-lords in their halls of stone,
....: Nine for Mortal Men doomed to die,
....: One for the Dark Lord on his dark throne
....: In the Land of Mordor where the Shadows lie.
....: One Ring to rule them all, One Ring to find them,
....: One Ring to bring them all and in the darkness bind them
....: In the Land of Mordor where the Shadows lie.'''
In [23]: comptest(rings)
original length: 410
zlib compressed length: 205
bz2 compressed length: 248
Run Code Online (Sandbox Code Playgroud)
您甚至不需要数据为ascii,您可以用任何东西提供zlib
>>> import zlib
>>> a='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' # + any binary data you want
>>> print zlib.compress(a)
x?KL$
?
>>>
Run Code Online (Sandbox Code Playgroud)
你可能想要的是什么 - 压缩数据是ascii字符串?我在这儿吗?
如果是这样 - 你应该知道你有一个非常小的字母来编码压缩数据=>所以你有更多的符号使用.
例如,在base64中编码二进制数据(你将获得ascii字符串),但你将使用大约30%的空间