为什么python tarfile gz没有减少文件大小

aba*_*rik 0 python compression gzip tar

所以,我试图将每个10MB的3个文本文件压缩为一个文件作为tar.gz,但它似乎没有减少最终的tar.gz. 最终的tar.gz文件大小仍然是30MB.

谁能告诉我为什么会这样?我有最高级别的压缩

>>> import os
>>> import sys
>>> import tarfile
>>> import tempfile
tarmode="w:gz"):
    ''>>> size_in_mb = 10
>>>
>>> def compress_str_to_tar(tmppath, files_str, tarfileprefix, tarmode="w:gz"):
...     ''' compress string contents in files and tar. finally creates a tar file in tmppath
...     @param tmppath: (str) pathdirectory where temp files to be compressed will be created
...     @param files_str: (dict) {filename: filecontent_in_str} these will be compressed
...     @param tarfileprefix: (str) output filename (without suffix) of tar
...     @param tarmode: (str) w:gz or w:bz2
...     '''
...     tar = tarfile.open(os.path.join(tmppath, tarfileprefix+'.tar.'+tarmode.split(':')[1]), tarmode, compresslevel=9)
...     for filename in files_str:
...         with open(os.path.join(tmppath, filename), 'wb') as tmpf:
...             tmpf.write(files_str[filename])
...         tar.add(os.path.join(tmppath, filename), arcname=filename)
...     tar.close()
...
...
>>> mail_size = 0
>>> files_str = {}
>>> for i in range(3):
...     d = os.urandom(1*size_in_mb*(10**6))
...     files_str['attachment'+str(i)+'.txt'] = d
...     mail_size += sys.getsizeof(d)
...
...
/10**6)

tmppath = tempfile.mkdtemp()
print('tar-tmppath', tmppath)
tarfileprefix = 'tmpfoobar'
compress_str_to_tar(tmppath, files_str, tarfileprefix, 'w:gz')
print('mail_size', float(sys.getsizeof(open(os.path.join(tmppath, tarfileprefix+'.tar.gz')).read()))/10**6)


>>> print('mail_size', float(mail_size)/10**6)
('mail_size', 30.000111)
>>>
>>> tmppath = tempfile.mkdtemp()
>>> print('tar-tmppath', tmppath)
('tar-tmppath', '/tmp/tmpndifyt')
>>> tarfileprefix = 'tmpfoobar'
>>> compress_str_to_tar(tmppath, files_str, tarfileprefix, 'w:gz')
>>> print('mail_size', float(sys.getsizeof(open(os.path.join(tmppath, tarfileprefix+'.tar.gz')).read()))/10**6)
('mail_size', 30.009782)
>>>
>>>
>>>
Run Code Online (Sandbox Code Playgroud)

Jea*_*bre 5

你想压缩某些数据产生的os.urandom随机的.

如果随机函数是好的,随机数据压缩非常严重.

压缩原理是识别重复模式.随机算法越好,您找到的重复模式就越少.

我建议您尝试使用真实文件,或从给定的单词列表(非随机字母)生成的随机文本,您将获得更好的压缩.