Python多进程/多线程可加快文件复制

Question

Python多进程/多线程可加快文件复制

Spe*_*cer 3 python multithreading shutil

我有一个程序，可以将大量文件从一个位置复制到另一个位置-我说的是100,000个以上的文件（此刻我正在按图像顺序复制314g）。它们都位于极端的大型，非常快速的网络存储RAID上。我正在使用shutil顺序复制文件，这需要一些时间，因此我试图找到最佳方法来对此进行优化。我注意到有些软件可以有效地使用多线程从网络中读取文件，从而大大缩短了加载时间，因此我想尝试在python中进行此操作。

我没有编程多线程/多进程的经验-这似乎是正确的领域吗？如果是这样，最好的方法是什么？我看过其他一些关于在python中复制线程文件的SO帖子，它们似乎都说您没有速度提高，但是考虑到我的硬件，我认为情况不会如此。目前，我的IO上限还差得远，资源只占1％左右（我本地有40个内核和64g的RAM）。

斯宾塞

Answer 1

Spe*_*cer 5

更新：

我从来没有让Gevent工作（第一个答案），因为我无法在没有互联网连接的情况下安装模块，而我的工作站上却没有。但是，仅使用python内置线程就可以将文件复制时间减少8（我已经学习了如何使用），并且我想将其发布为对感兴趣的任何人的附加答案！这是下面的代码，可能需要注意的是，由于硬件/网络设置的不同，我的8x复制时间很可能因环境而异。

import Queue, threading, os, time
import shutil

fileQueue = Queue.Queue()
destPath = 'path/to/cop'

class ThreadedCopy:
    totalFiles = 0
    copyCount = 0
    lock = threading.Lock()

    def __init__(self):
        with open("filelist.txt", "r") as txt: #txt with a file per line
            fileList = txt.read().splitlines()

        if not os.path.exists(destPath):
            os.mkdir(destPath)

        self.totalFiles = len(fileList)

        print str(self.totalFiles) + " files to copy."
        self.threadWorkerCopy(fileList)


    def CopyWorker(self):
        while True:
            fileName = fileQueue.get()
            shutil.copy(fileName, destPath)
            fileQueue.task_done()
            with self.lock:
                self.copyCount += 1
                percent = (self.copyCount * 100) / self.totalFiles
                print str(percent) + " percent copied."

    def threadWorkerCopy(self, fileNameList):
        for i in range(16):
            t = threading.Thread(target=self.CopyWorker)
            t.daemon = True
            t.start()
        for fileName in fileNameList:
            fileQueue.put(fileName)
        fileQueue.join()

ThreadedCopy()

Run Code Online (Sandbox Code Playgroud)

Answer 2

Dav*_*iaz 5

使用怎么样ThreadPool？

import os
import glob
import shutil
from functools import partial
from multiprocessing.pool import ThreadPool

DST_DIR = '../path/to/new/dir'
SRC_DIR = '../path/to/files/to/copy'

# copy_to_mydir will copy any file you give it to DST_DIR
copy_to_mydir = partial(shutil.copy, dst=DST_DIR)

# list of files we want to copy
to_copy = glob.glob(os.path.join(SRC_DIR, '*'))

with ThreadPool(4) as p:
  p.map(copy_to_mydir, to_copy)

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，9 月前
查看次数：	3698 次
最近记录：	8 年，7 月前