我正在使用以下代码来提取tar文件:
import tarfile
tar = tarfile.open("sample.tar.gz")
tar.extractall()
tar.close()
Run Code Online (Sandbox Code Playgroud)
但是,我想密切关注目前正在提取哪些文件的进度.我怎样才能做到这一点?
额外奖励积分:是否有可能创造一定比例的提取过程?我想用它来为tkinter更新进度条.谢谢!
tok*_*and 10
文件进度和全局进度:
import io
import os
import tarfile
def get_file_progress_file_object_class(on_progress):
class FileProgressFileObject(tarfile.ExFileObject):
def read(self, size, *args):
on_progress(self.name, self.position, self.size)
return tarfile.ExFileObject.read(self, size, *args)
return FileProgressFileObject
class TestFileProgressFileObject(tarfile.ExFileObject):
def read(self, size, *args):
on_progress(self.name, self.position, self.size)
return tarfile.ExFileObject.read(self, size, *args)
class ProgressFileObject(io.FileIO):
def __init__(self, path, *args, **kwargs):
self._total_size = os.path.getsize(path)
io.FileIO.__init__(self, path, *args, **kwargs)
def read(self, size):
print("Overall process: %d of %d" %(self.tell(), self._total_size))
return io.FileIO.read(self, size)
def on_progress(filename, position, total_size):
print("%s: %d of %s" %(filename, position, total_size))
tarfile.TarFile.fileobject = get_file_progress_file_object_class(on_progress)
tar = tarfile.open(fileobj=ProgressFileObject("a.tgz"))
tar.extractall()
tar.close()
Run Code Online (Sandbox Code Playgroud)
您可以只使用tqdm()并打印正在提取的文件数量的进度:
import tarfile
from tqdm import tqdm
# open your tar.gz file
with tarfile.open(name=path) as tar:
# Go over each member
for member in tqdm(iterable=tar.getmembers(), total=len(tar.getmembers())):
# Extract member
tar.extract(member=member)
Run Code Online (Sandbox Code Playgroud)
您可以指定members参数extractall()
with tarfile.open(<path>, 'r') as tarball:
tarball.extractall(path=<some path>, members = track_progress(tarball))
def track_progress(members):
for member in members:
# this will be the current file being extracted
yield member
Run Code Online (Sandbox Code Playgroud)
member是TarInfo对象,请在此处查看所有可用的函数和属性