过滤os.walk()目录和文件

Pau*_*tas 41 python filtering os.walk

我正在寻找一种方法来包含/排除文件模式并从os.walk()调用中排除目录.

这就是我现在正在做的事情:

import fnmatch
import os

includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']

def _filter(paths):
    matches = []

    for path in paths:
        append = None

        for include in includes:
            if os.path.isdir(path):
                append = True
                break

            if fnmatch.fnmatch(path, include):
                append = True
                break

        for exclude in excludes:
            if os.path.isdir(path) and path == exclude:
                append = False
                break

            if fnmatch.fnmatch(path, exclude):
                append = False
                break

        if append:
            matches.append(path)

    return matches

for root, dirs, files in os.walk('/home/paulo-freitas'):
    dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))
    files[:] = _filter(map(lambda f: os.path.join(root, f), files))

    for filename in files:
        filename = os.path.join(root, filename)

        print filename
Run Code Online (Sandbox Code Playgroud)

问题是:有更好的方法吗?怎么样?

Obe*_*nne 50

此解决方案用于fnmatch.translate将glob模式转换为正则表达式(它假定仅包含用于文件):

import fnmatch
import os
import os.path
import re

includes = ['*.doc', '*.odt'] # for files only
excludes = ['/home/paulo-freitas/Documents'] # for dirs and files

# transform glob patterns to regular expressions
includes = r'|'.join([fnmatch.translate(x) for x in includes])
excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'

for root, dirs, files in os.walk('/home/paulo-freitas'):

    # exclude dirs
    dirs[:] = [os.path.join(root, d) for d in dirs]
    dirs[:] = [d for d in dirs if not re.match(excludes, d)]

    # exclude/include files
    files = [os.path.join(root, f) for f in files]
    files = [f for f in files if not re.match(excludes, f)]
    files = [f for f in files if re.match(includes, f)]

    for fname in files:
        print fname
Run Code Online (Sandbox Code Playgroud)

  • 在一些谷歌搜索之后,似乎[:]语法`dirs [:] = [os.path.join(root,d)对于d in dirs]的点是采用变异切片方法,这改变了列表到位,而不是创建新列表.这抓住了我 - 没有[:],它不起作用. (7认同)
  • @pf.me:你是对的,我没有考虑过这个案子.所以要么你*1)*将排除列表理解包装在`if exclude`,*2)*前缀`不re.match(排除,...)`中,用`not exclude或`,或*3)*set如果原始排除为空,则"排除"到永不匹配的正则表达式.我使用variant*3*更新了我的答案. (3认同)
  • @Daniel:切片不仅可以用于获取列表的*值,还可以用于*分配*选定的项目.由于`[:]`表示完整列表,因此分配给该片会替换列表的整个先前内容.请参见http://docs.python.org/2/library/stdtypes.html#mutable-sequence-types. (2认同)

koj*_*iro 23

来自docs.python.org:

os.walk(top [,topdown = True [,onerror = None [,followlinks = False]]])

当topdown为True时,调用者可以就地修改dirnames列表...这可以用来修剪搜索...

for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):
    # excludes can be done with fnmatch.filter and complementary set,
    # but it's more annoying to read.
    dirs[:] = [d for d in dirs if d not in excludes] 
    for pat in includes:
        for f in fnmatch.filter(files, pat):
            print os.path.join(root, f)
Run Code Online (Sandbox Code Playgroud)

我应该指出,上面的代码假设excludes是一个模式,而不是一个完整的路径.您需要调整列表推导以过滤是否os.path.join(root, d) not in excludes匹配OP案例.

  • 这里的“排除”和“包含”是什么样的?有没有一个例子可以配合这个答案? (3认同)

kur*_*umi 7

为什么fnmatch?

import os
excludes=....
for ROOT,DIR,FILES in os.walk("/path"):
    for file in FILES:
       if file.endswith(('doc','odt')):
          print file
    for directory in DIR:
       if not directory in excludes :
          print directory
Run Code Online (Sandbox Code Playgroud)

没有详尽的测试

  • 结尾应该是 .doc 和 .odt。因为上面的代码中会返回一个名为 mydoc [没有文件扩展名] 的文件。另外,我认为这将满足OP发布的具体案例。我猜排除项也可能包含文件,而包含项可能包含目录。 (2认同)