Dropzone JS with Flask:分块并行上传

swa*_*her 5 python upload chunked flask dropzone.js

我正在尝试使用 dropzone js 和 Flask 作为后端。拖放区配置:

<form method="POST" action='/process_chunk' class="dropzone dz-clickable"
     id="dropper" enctype="multipart/form-data">
</form>

<script type="application/javascript">
   Dropzone.options.dropper = {
       {# https://gitlab.com/meno/dropzone/wikis/faq#chunked-uploads #}
       paramName: 'file',
       acceptedFiles: '.csv',
       chunking: true,
       forceChunking: true,
       chunkSize: 100000, // bytes
       parallelChunkUploads: true,
       maxFilesize: 1025, // megabytes
</script>
Run Code Online (Sandbox Code Playgroud)

我的烧瓶后端如下所示

@app.route('/process_chunk', methods=['POST'])
def process_chunk():
    current_chunk = int(request.form['dzchunkindex'])

    file = request.files['file']
    save_path = os.path.join(app.config['DATA_DIR'], file.filename)

    try:
        with open(save_path, 'ab+') as f:
            # Goto the offset, aka after the chunks we already wrote
            f.seek(int(request.form['dzchunkbyteoffset']))
            f.write(file.stream.read())
    except OSError:
        # log.exception will include the traceback so we can see what's wrong
        log.exception('Could not write to file')
        return make_response(("Couldn't write the file to disk", 500))

    total_chunks = int(request.form['dztotalchunkcount'])

    if current_chunk + 1 == total_chunks:
        # This was the last chunk, the file should be complete and the size we expect
        if os.path.getsize(save_path) != int(request.form['dztotalfilesize']):
            log.error(f"File {file.filename} was completed, "
                      f"but has a size mismatch."
                      f"Was {os.path.getsize(save_path)} but we"
                      f" expected {request.form['dztotalfilesize']} ")
            return make_response(('Size mismatch', 500))
        else:
            log.info(f'File {file.filename} has been uploaded successfully')
    else:
        log.debug(f'Chunk {current_chunk + 1} of {total_chunks} '
                  f'for file {file.filename} complete')

    return make_response(("Chunk upload successful", 200))
Run Code Online (Sandbox Code Playgroud)

parallelChunkUploads如果我设置为 false,则效果很好。块一块一块上传,生成的文件看起来没问题。例如,我使用小文件(409 字节)并将块大小设置为 50 字节:

串行块上传

结果文件看起来与输入文件完全相同。当我设置parallelChunkUploads为 true 时,块并行上传:

并行块上传

但结果文件完全混乱:

原始文件

cod,char,xx
01,aaaa,xx
02,bbbb,xx
03,cccc,xx
04,dddd,xx
05,eeee,xx
06,ffff,xx
07,gggg,xx
08,iiii,xx
09,kkkk,xx
10,llll,xx
11,mmmm,xx
12,gerf,xx
13,flrg,xx
14,erge,xx
15,lkro,xx
16,ergf,xx
17,kiwu,xx
18,erjg,xx
19,hytj,xx
20,utkj,xx
21,rger,xx
22,ehth,xx
23,kmik,xx
24,ergb,xx
25,ergk,xx
26,egeg,xx
27,ejer,xx
28,gtrh,xx
29,thrh,xx
30,rhtr,xx
31,gtrh,xx
32,thrh,xx
33,rhtr,xx
Run Code Online (Sandbox Code Playgroud)

上传的文件

cod,char,xx
01,aaaa,xx
02,bbbb,xx
03,cccc,xx
0iiii,xx
09,kkkk,xx
10,llll,xx
11,mmmm,xx
12,gerf,xx
13,flrg,xx
14,erge,xx
15,lkro,xx
16,ergf4,dddd,xx
05,eeee,xx
06,ffff,xx
07,gggg,xx
08,x
21,rger,xx
22,ehth,xx
23,kmik,xx
24,ergb,xx
,xx
17,kiwu,xx
18,erjg,xx
19,hytj,xx
20,utkj,x9,thrh,xx
30,rhtr,xx
31,gtrh,xx
32,thrh,xx
33,rhtr,xx

25,ergk,xx
26,egeg,xx
27,ejer,xx
28,gtrh,xx
2
Run Code Online (Sandbox Code Playgroud)

最后一个块是红色的,因为前端得到'Size mismatch', 500响应,因为上传最后一个块后的文件大小不同。你们知道如何修复它吗?

小智 0

我确实认为您的问题是您正在以不安全的方式(非线程安全)从多个进程写入文件,您需要将共享资源(文件)的访问权限封闭在关键部分。您可以使用锁定文件 https://py-filelock.readthedocs.io/en/latest/来实现此 目的