如何保护自己免受gzip或bzip2炸弹袭击?

Joa*_*ner 18 python security gzip bzip2

这与关于拉链炸弹的问题有关,但考虑到gzip或bzip2压缩,例如接受.tar.gz文件的Web服务.

Python提供了一个方便使用的方便的tarfile模块,但似乎没有提供针对zipbombs的保护.

在使用tarfile模块的python代码中,检测zip炸弹的最优雅方法是什么,最好不要从tarfile模块中复制过多的逻辑(例如透明的解压缩支持)?

而且,只是为了简化它:不涉及真正的文件; 输入是一个类似文件的对象(由Web框架提供,表示用户上传的文件).

jfs*_*jfs 10

您可以使用resource模块来限制进程及其子进程可用的资源.

如果你需要在内存中解压缩,那么你可以设置resource.RLIMIT_AS(或RLIMIT_DATA,RLIMIT_STACK)例如,使用上下文管理器将自动恢复到以前的值:

import contextlib
import resource

@contextlib.contextmanager
def limit(limit, type=resource.RLIMIT_AS):
    soft_limit, hard_limit = resource.getrlimit(type)
    resource.setrlimit(type, (limit, hard_limit)) # set soft limit
    try:
        yield
    finally:
        resource.setrlimit(type, (soft_limit, hard_limit)) # restore

with limit(1 << 30): # 1GB 
    # do the thing that might try to consume all memory
Run Code Online (Sandbox Code Playgroud)

如果达到限制; MemoryError被提出来了.


Mar*_*ler 5

这将确定gzip流的未压缩大小,同时使用有限的内存:

#!/usr/bin/python
import sys
import zlib
f = open(sys.argv[1], "rb")
z = zlib.decompressobj(15+16)
total = 0
while True:
    buf = z.unconsumed_tail
    if buf == "":
        buf = f.read(1024)
        if buf == "":
            break
    got = z.decompress(buf, 4096)
    if got == "":
        break
    total += len(got)
print total
if z.unused_data != "" or f.read(1024) != "":
    print "warning: more input after end of gzip stream"
Run Code Online (Sandbox Code Playgroud)

在提取时,它将略微高估tar文件中所有文件所需的空间.长度包括那些文件,以及tar目录信息.

除了输入数据的大小之外,gzip.py代码不控制解压缩的数据量.在gzip.py中,它一次读取1024个压缩字节.因此,如果您对未压缩数据(1032*1024,其中1032:1是deflate的最大压缩比)的内存使用量大约为1056768字节,则可以使用gzip.py.此处的解决方案使用zlib.decompress第二个参数,该参数限制未压缩数据的数量.gzip.py没有.

这将通过解码tar格式准确地确定提取的tar条目的总大小:

#!/usr/bin/python

import sys
import zlib

def decompn(f, z, n):
    """Return n uncompressed bytes, or fewer if at the end of the compressed
       stream.  This only decompresses as much as necessary, in order to
       avoid excessive memory usage for highly compressed input.
    """
    blk = ""
    while len(blk) < n:
        buf = z.unconsumed_tail
        if buf == "":
            buf = f.read(1024)
        got = z.decompress(buf, n - len(blk))
        blk += got
        if got == "":
            break
    return blk

f = open(sys.argv[1], "rb")
z = zlib.decompressobj(15+16)
total = 0
left = 0
while True:
    blk = decompn(f, z, 512)
    if len(blk) < 512:
        break
    if left == 0:
        if blk == "\0"*512:
            continue
        if blk[156] in ["1", "2", "3", "4", "5", "6"]:
            continue
        if blk[124] == 0x80:
            size = 0
            for i in range(125, 136):
                size <<= 8
                size += blk[i]
        else:
            size = int(blk[124:136].split()[0].split("\0")[0], 8)
        if blk[156] not in ["x", "g", "X", "L", "K"]:
                total += size
        left = (size + 511) // 512
    else:
        left -= 1
print total
if blk != "":
    print "warning: partial final block"
if left != 0:
    print "warning: tar file ended in the middle of an entry"
if z.unused_data != "" or f.read(1024) != "":
    print "warning: more input after end of gzip stream"
Run Code Online (Sandbox Code Playgroud)

您可以使用此变量来扫描tar文件中的炸弹.这样做的好处是,在您甚至必须解压缩该数据之前,在头信息中找到一个大的大小.

对于.tar.bz2档案,Python bz2库(至少从3.3开始)对于消耗太多内存的bz2炸弹来说不可避免地是不安全的.该bz2.decompress函数不提供像第二个参数zlib.decompress.由于行程编码导致bz2格式具有比zlib高得多的高得多的压缩比,因此情况更糟.bzip2将1 GB的零压缩为722个字节.因此bz2.decompress,zlib.decompress即使没有第二个参数,也无法通过计量输入来计量输出.对解压缩输出大小缺乏限制是Python界面的一个基本缺陷.

我查看了3.3中的_bz2module.c,看看是否有一种未记录的方法来使用它来避免这个问题.没有其他办法了.其中的decompress函数只是保持增长结果缓冲区,直到它可以解压缩所有提供的输入._bz2module.c需要修复.


Joa*_*ner 3

我想答案是:没有简单、现成的解决方案。这是我现在使用的:

class SafeUncompressor(object):
    """Small proxy class that enables external file object
    support for uncompressed, bzip2 and gzip files. Works transparently, and
    supports a maximum size to avoid zipbombs.
    """
    blocksize = 16 * 1024

    class FileTooLarge(Exception):
        pass

    def __init__(self, fileobj, maxsize=10*1024*1024):
        self.fileobj = fileobj
        self.name = getattr(self.fileobj, "name", None)
        self.maxsize = maxsize
        self.init()

    def init(self):
        import bz2
        import gzip
        self.pos = 0
        self.fileobj.seek(0)
        self.buf = ""
        self.format = "plain"

        magic = self.fileobj.read(2)
        if magic == '\037\213':
            self.format = "gzip"
            self.gzipobj = gzip.GzipFile(fileobj = self.fileobj, mode = 'r')
        elif magic == 'BZ':
            raise IOError, "bzip2 support in SafeUncompressor disabled, as self.bz2obj.decompress is not safe"
            self.format = "bz2"
            self.bz2obj = bz2.BZ2Decompressor()
        self.fileobj.seek(0)


    def read(self, size):
        b = [self.buf]
        x = len(self.buf)
        while x < size:
            if self.format == 'gzip':
                data = self.gzipobj.read(self.blocksize)
                if not data:
                    break
            elif self.format == 'bz2':
                raw = self.fileobj.read(self.blocksize)
                if not raw:
                    break
                # this can already bomb here, to some extend.
                # so disable bzip support until resolved.
                # Also monitor http://stackoverflow.com/questions/13622706/how-to-protect-myself-from-a-gzip-or-bzip2-bomb for ideas
                data = self.bz2obj.decompress(raw)
            else:
                data = self.fileobj.read(self.blocksize)
                if not data:
                    break
            b.append(data)
            x += len(data)

            if self.pos + x > self.maxsize:
                self.buf = ""
                self.pos = 0
                raise SafeUncompressor.FileTooLarge, "Compressed file too large"
        self.buf = "".join(b)

        buf = self.buf[:size]
        self.buf = self.buf[size:]
        self.pos += len(buf)
        return buf

    def seek(self, pos, whence=0):
        if whence != 0:
            raise IOError, "SafeUncompressor only supports whence=0"
        if pos < self.pos:
            self.init()
        self.read(pos - self.pos)

    def tell(self):
        return self.pos
Run Code Online (Sandbox Code Playgroud)

它不适用于 bzip2,因此该部分代码被禁用。原因是这bz2.BZ2Decompressor.decompress已经可能产生大量不需要的数据。