Gar*_*hby 157 python directory
在我重新发明这个特定的轮子之前,有没有人有一个很好的例程来计算使用Python的目录大小?如果例程能够很好地格式化Mb/Gb等,那将是非常好的.
mon*_*kut 210
这抓住了子目录:
import os
def get_size(start_path = '.'):
total_size = 0
for dirpath, dirnames, filenames in os.walk(start_path):
for f in filenames:
fp = os.path.join(dirpath, f)
# skip if it is symbolic link
if not os.path.islink(fp):
total_size += os.path.getsize(fp)
return total_size
print(get_size(), 'bytes')
Run Code Online (Sandbox Code Playgroud)
使用os.listdir(不包括子目录)的oneliner乐趣:
import os
sum(os.path.getsize(f) for f in os.listdir('.') if os.path.isfile(f))
Run Code Online (Sandbox Code Playgroud)
参考:
os.path.getsize - 以字节为单位给出大小
更新 要使用os.path.getsize,这比使用os.stat().st_size方法更清晰.
感谢ghostdog74指出这一点!
os.stat - st_size以字节为单位给出大小.也可用于获取文件大小和其他文件相关信息.
更新2018年
如果您使用Python 3.4或之前的版本,那么您可以考虑使用walk
第三方scandir
软件包提供的更有效的方法.在Python 3.5及更高版本中,此软件包已合并到标准库中,并且os.walk
已收到相应的性能提升.
fla*_*ier 36
到目前为止建议的一些方法实现递归,其他方法使用shell或不会产生整齐格式化的结果.当您的代码对于Linux平台来说是一次性的时,您可以像往常一样进行格式化,包括递归,作为单行.除了print
在最后一行,它将适用于当前版本python2
和python3
:
du.py
-----
#!/usr/bin/python3
import subprocess
def du(path):
"""disk usage in human readable format (e.g. '2,1GB')"""
return subprocess.check_output(['du','-sh', path]).split()[0].decode('utf-8')
if __name__ == "__main__":
print(du('.'))
Run Code Online (Sandbox Code Playgroud)
简单,高效,适用于文件和多级目录:
$ chmod 750 du.py
$ ./du.py
2,9M
Run Code Online (Sandbox Code Playgroud)
5年后有点晚了,但因为这仍然是搜索引擎的热门列表,它可能会有所帮助......
Sam*_*mpa 24
这是一个递归函数(它递归地总结了所有子文件夹及其各自文件的大小),它返回与运行"du -sb"时完全相同的字节.在linux中("."表示"当前文件夹"):
import os
def getFolderSize(folder):
total_size = os.path.getsize(folder)
for item in os.listdir(folder):
itempath = os.path.join(folder, item)
if os.path.isfile(itempath):
total_size += os.path.getsize(itempath)
elif os.path.isdir(itempath):
total_size += getFolderSize(itempath)
return total_size
print "Size: " + str(getFolderSize("."))
Run Code Online (Sandbox Code Playgroud)
bla*_*kev 15
使用Python 3.5递归文件夹大小 os.scandir
def folder_size(path='.'):
total = 0
for entry in os.scandir(path):
if entry.is_file():
total += entry.stat().st_size
elif entry.is_dir():
total += folder_size(entry.path)
return total
Run Code Online (Sandbox Code Playgroud)
Ter*_*vis 12
使用pathlib
我想出了这个单行来获取文件夹的大小:
sum(file.stat().st_size for file in Path(folder).rglob('*'))
Run Code Online (Sandbox Code Playgroud)
这就是我想出的格式良好的输出:
from pathlib import Path
def get_folder_size(folder):
return ByteSize(sum(file.stat().st_size for file in Path(folder).rglob('*')))
class ByteSize(int):
_kB = 1024
_suffixes = 'B', 'kB', 'MB', 'GB', 'PB'
def __new__(cls, *args, **kwargs):
return super().__new__(cls, *args, **kwargs)
def __init__(self, *args, **kwargs):
self.bytes = self.B = int(self)
self.kilobytes = self.kB = self / self._kB**1
self.megabytes = self.MB = self / self._kB**2
self.gigabytes = self.GB = self / self._kB**3
self.petabytes = self.PB = self / self._kB**4
*suffixes, last = self._suffixes
suffix = next((
suffix
for suffix in suffixes
if 1 < getattr(self, suffix) < self._kB
), last)
self.readable = suffix, getattr(self, suffix)
super().__init__()
def __str__(self):
return self.__format__('.2f')
def __repr__(self):
return '{}({})'.format(self.__class__.__name__, super().__repr__())
def __format__(self, format_spec):
suffix, val = self.readable
return '{val:{fmt}} {suf}'.format(val=val, fmt=format_spec, suf=suffix)
def __sub__(self, other):
return self.__class__(super().__sub__(other))
def __add__(self, other):
return self.__class__(super().__add__(other))
def __mul__(self, other):
return self.__class__(super().__mul__(other))
def __rsub__(self, other):
return self.__class__(super().__sub__(other))
def __radd__(self, other):
return self.__class__(super().__add__(other))
def __rmul__(self, other):
return self.__class__(super().__rmul__(other))
Run Code Online (Sandbox Code Playgroud)
用法:
>>> size = get_folder_size("c:/users/tdavis/downloads")
>>> print(size)
5.81 GB
>>> size.GB
5.810891855508089
>>> size.gigabytes
5.810891855508089
>>> size.PB
0.005674699077644618
>>> size.MB
5950.353260040283
>>> size
ByteSize(6239397620)
Run Code Online (Sandbox Code Playgroud)
我也遇到了这个问题,它有一些更紧凑且可能更高效的打印文件大小的策略。
接受的答案没有考虑硬链接或软链接,并会将这些文件计算两次.您需要跟踪您看到的哪些inode,而不是添加这些文件的大小.
import os
def get_size(start_path='.'):
total_size = 0
seen = {}
for dirpath, dirnames, filenames in os.walk(start_path):
for f in filenames:
fp = os.path.join(dirpath, f)
try:
stat = os.stat(fp)
except OSError:
continue
try:
seen[stat.st_ino]
except KeyError:
seen[stat.st_ino] = True
else:
continue
total_size += stat.st_size
return total_size
print get_size()
Run Code Online (Sandbox Code Playgroud)
monknut答案是好的,但它在破坏的符号链接上失败,所以你还必须检查这个路径是否真的存在
if os.path.exists(fp):
total_size += os.stat(fp).st_size
Run Code Online (Sandbox Code Playgroud)
克里斯的回答很好,但是通过使用一个集来检查看到的目录可以使其更加惯用,这也避免了使用控制流的异常:
def directory_size(path):
total_size = 0
seen = set()
for dirpath, dirnames, filenames in os.walk(path):
for f in filenames:
fp = os.path.join(dirpath, f)
try:
stat = os.stat(fp)
except OSError:
continue
if stat.st_ino in seen:
continue
seen.add(stat.st_ino)
total_size += stat.st_size
return total_size # size in bytes
Run Code Online (Sandbox Code Playgroud)
小智 7
一个递归的单行:
def getFolderSize(p):
from functools import partial
prepend = partial(os.path.join, p)
return sum([(os.path.getsize(f) if os.path.isfile(f) else getFolderSize(f)) for f in map(prepend, os.listdir(p))])
Run Code Online (Sandbox Code Playgroud)
聚会有点晚了,但只要你安装了glob2和humanize ,就可以在一行中找到。请注意,在Python 3中,默认iglob
具有递归模式。如何修改 Python 3 的代码留给读者作为一个简单的练习。
>>> import os
>>> from humanize import naturalsize
>>> from glob2 import iglob
>>> naturalsize(sum(os.path.getsize(x) for x in iglob('/var/**'))))
'546.2 MB'
Run Code Online (Sandbox Code Playgroud)
溶液的性质:
du
以同样的方式计算符号链接st.st_blocks
已使用的磁盘空间,因此仅适用于类 Unix 系统代码:
import os
def du(path):
if os.path.islink(path):
return (os.lstat(path).st_size, 0)
if os.path.isfile(path):
st = os.lstat(path)
return (st.st_size, st.st_blocks * 512)
apparent_total_bytes = 0
total_bytes = 0
have = []
for dirpath, dirnames, filenames in os.walk(path):
apparent_total_bytes += os.lstat(dirpath).st_size
total_bytes += os.lstat(dirpath).st_blocks * 512
for f in filenames:
fp = os.path.join(dirpath, f)
if os.path.islink(fp):
apparent_total_bytes += os.lstat(fp).st_size
continue
st = os.lstat(fp)
if st.st_ino in have:
continue # skip hardlinks which were already counted
have.append(st.st_ino)
apparent_total_bytes += st.st_size
total_bytes += st.st_blocks * 512
for d in dirnames:
dp = os.path.join(dirpath, d)
if os.path.islink(dp):
apparent_total_bytes += os.lstat(dp).st_size
return (apparent_total_bytes, total_bytes)
Run Code Online (Sandbox Code Playgroud)
用法示例:
>>> du('/lib')
(236425839, 244363264)
$ du -sb /lib
236425839 /lib
$ du -sB1 /lib
244363264 /lib
Run Code Online (Sandbox Code Playgroud)
溶液的性质:
代码:
def humanized_size(num, suffix='B', si=False):
if si:
units = ['','K','M','G','T','P','E','Z']
last_unit = 'Y'
div = 1000.0
else:
units = ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']
last_unit = 'Yi'
div = 1024.0
for unit in units:
if abs(num) < div:
return "%3.1f%s%s" % (num, unit, suffix)
num /= div
return "%.1f%s%s" % (num, last_unit, suffix)
Run Code Online (Sandbox Code Playgroud)
用法示例:
>>> humanized_size(236425839)
'225.5MiB'
>>> humanized_size(236425839, si=True)
'236.4MB'
>>> humanized_size(236425839, si=True, suffix='')
'236.4M'
Run Code Online (Sandbox Code Playgroud)
适用于python3.5+
from pathlib import Path
def get_size(path: str) -> int:
return sum(p.stat().st_size for p in Path(path).rglob('*'))
Run Code Online (Sandbox Code Playgroud)
用法::
In [6]: get_size('/etc/not-exist-path')
Out[6]: 0
In [7]: get_size('.')
Out[7]: 12038689
In [8]: def filesize(size: int) -> str:
...: for unit in ("B", "K", "M", "G"):
...: if size < 1024:
...: break
...: size /= 1024
...: return f"{size:.1f}{unit}"
...:
In [9]: filesize(get_size('.'))
Out[9]: '11.5M'
Run Code Online (Sandbox Code Playgroud)
你可以这样做:
import commands
size = commands.getoutput('du -sh /path/').split()[0]
Run Code Online (Sandbox Code Playgroud)
在这种情况下,我在返回结果之前没有测试结果,如果您愿意,可以使用commands.getstatusoutput检查它。
对于问题的第二部分
def human(size):
B = "B"
KB = "KB"
MB = "MB"
GB = "GB"
TB = "TB"
UNITS = [B, KB, MB, GB, TB]
HUMANFMT = "%f %s"
HUMANRADIX = 1024.
for u in UNITS[:-1]:
if size < HUMANRADIX : return HUMANFMT % (size, u)
size /= HUMANRADIX
return HUMANFMT % (size, UNITS[-1])
Run Code Online (Sandbox Code Playgroud)