Python glob但是反对字符串列表而不是文件系统

Jas*_*n S 40 python regex glob python-2.7

我希望能够将glob格式中的模式匹配到字符串列表,而不是文件系统中的实际文件.有没有办法做到这一点,或将glob模式轻松转换为正则表达式?

Mar*_*ers 30

glob模块将fnmatch模块用于各个路径元素.

这意味着路径被分成目录名和文件名,如果目录名包含元字符(包含任何字符[,*?),则递归扩展.

如果你有一个简单文件名的字符串列表,那么只需使用该fnmatch.filter()函数即可:

import fnmatch

matching = fnmatch.filter(filenames, pattern)
Run Code Online (Sandbox Code Playgroud)

但是如果它们包含完整路径,则需要执行更多工作,因为生成的正则表达式不会考虑路径段(通配符不会排除分隔符,也不会针对跨平台路径匹配进行调整).

你可以从路径构造一个简单的trie,然后匹配你的模式:

import fnmatch
import glob
import os.path
from itertools import product


# Cross-Python dictionary views on the keys 
if hasattr(dict, 'viewkeys'):
    # Python 2
    def _viewkeys(d):
        return d.viewkeys()
else:
    # Python 3
    def _viewkeys(d):
        return d.keys()


def _in_trie(trie, path):
    """Determine if path is completely in trie"""
    current = trie
    for elem in path:
        try:
            current = current[elem]
        except KeyError:
            return False
    return None in current


def find_matching_paths(paths, pattern):
    """Produce a list of paths that match the pattern.

    * paths is a list of strings representing filesystem paths
    * pattern is a glob pattern as supported by the fnmatch module

    """
    if os.altsep:  # normalise
        pattern = pattern.replace(os.altsep, os.sep)
    pattern = pattern.split(os.sep)

    # build a trie out of path elements; efficiently search on prefixes
    path_trie = {}
    for path in paths:
        if os.altsep:  # normalise
            path = path.replace(os.altsep, os.sep)
        _, path = os.path.splitdrive(path)
        elems = path.split(os.sep)
        current = path_trie
        for elem in elems:
            current = current.setdefault(elem, {})
        current.setdefault(None, None)  # sentinel

    matching = []

    current_level = [path_trie]
    for subpattern in pattern:
        if not glob.has_magic(subpattern):
            # plain element, element must be in the trie or there are
            # 0 matches
            if not any(subpattern in d for d in current_level):
                return []
            matching.append([subpattern])
            current_level = [d[subpattern] for d in current_level if subpattern in d]
        else:
            # match all next levels in the trie that match the pattern
            matched_names = fnmatch.filter({k for d in current_level for k in d}, subpattern)
            if not matched_names:
                # nothing found
                return []
            matching.append(matched_names)
            current_level = [d[n] for d in current_level for n in _viewkeys(d) & set(matched_names)]

    return [os.sep.join(p) for p in product(*matching)
            if _in_trie(path_trie, p)]
Run Code Online (Sandbox Code Playgroud)

这一口可以使用路径上的任何地方快速找到匹配:

>>> paths = ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar']
>>> find_matching_paths(paths, '/foo/bar/*')
['/foo/bar/baz', '/foo/bar/bar']
>>> find_matching_paths(paths, '/*/bar/b*')
['/foo/bar/baz', '/foo/bar/bar']
>>> find_matching_paths(paths, '/*/[be]*/b*')
['/foo/bar/baz', '/foo/bar/bar', '/spam/eggs/baz']
Run Code Online (Sandbox Code Playgroud)


Niz*_*med 15

好艺术家复制; 伟大的艺术家.

我偷了;)

fnmatch.translate转换水珠?*对正则表达式..*分别.我没有调整它.

import re

def glob2re(pat):
    """Translate a shell PATTERN to a regular expression.

    There is no way to quote meta-characters.
    """

    i, n = 0, len(pat)
    res = ''
    while i < n:
        c = pat[i]
        i = i+1
        if c == '*':
            #res = res + '.*'
            res = res + '[^/]*'
        elif c == '?':
            #res = res + '.'
            res = res + '[^/]'
        elif c == '[':
            j = i
            if j < n and pat[j] == '!':
                j = j+1
            if j < n and pat[j] == ']':
                j = j+1
            while j < n and pat[j] != ']':
                j = j+1
            if j >= n:
                res = res + '\\['
            else:
                stuff = pat[i:j].replace('\\','\\\\')
                i = j+1
                if stuff[0] == '!':
                    stuff = '^' + stuff[1:]
                elif stuff[0] == '^':
                    stuff = '\\' + stuff
                res = '%s[%s]' % (res, stuff)
        else:
            res = res + re.escape(c)
    return res + '\Z(?ms)'
Run Code Online (Sandbox Code Playgroud)

这一点fnmatch.filter,都是re.matchre.search工作.

def glob_filter(names,pat):
    return (name for name in names if re.match(glob2re(pat),name))
Run Code Online (Sandbox Code Playgroud)

此页面上的Glob模式和字符串通过测试.

pat_dict = {
            'a/b/*/f.txt': ['a/b/c/f.txt', 'a/b/q/f.txt', 'a/b/c/d/f.txt','a/b/c/d/e/f.txt'],
            '/foo/bar/*': ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar'],
            '/*/bar/b*': ['/foo/bar/baz', '/foo/bar/bar'],
            '/*/[be]*/b*': ['/foo/bar/baz', '/foo/bar/bar'],
            '/foo*/bar': ['/foolicious/spamfantastic/bar', '/foolicious/bar']

        }
for pat in pat_dict:
    print('pattern :\t{}\nstrings :\t{}'.format(pat,pat_dict[pat]))
    print('matched :\t{}\n'.format(list(glob_filter(pat_dict[pat],pat))))
Run Code Online (Sandbox Code Playgroud)

  • 很棒的独家新闻!是的,将模式转换为忽略路径分隔符的模式是个好主意。请注意,虽然它不处理 `os.sep` 或 `os.altsep`,但它应该很容易调整。 (2认同)
  • 我通常只是在任何处理之前先将路径规范化为使用正斜杠。 (2认同)

Vee*_*rac 10

在Python 3.4+上你可以使用PurePath.match.

pathlib.PurePath(path_string).match(pattern)
Run Code Online (Sandbox Code Playgroud)

在Python 3.3或更早版本(包括2.x)上,pathlib从PyPI获取.

请注意,以获得独立于平台的结果(这将取决于为什么你运行这个)你想明确说明PurePosixPathPureWindowsPath.

  • 这种方法的一个好处是,如果不需要,它不需要您指定 glob 语法 (`**/*`)。例如,如果您只是想根据文件名查找路径。 (2认同)
  • @schirrmacher `pathlib.PurePath.match` 不匹配路径分隔符,并且它们始终匹配尾在前。Python master 支持 `**` glob,这可以在这里工作,但我认为它还没有发布。 (2认同)

mu *_*u 無 5

虽然fnmatch.fnmatch可以直接用于检查模式是否与文件名匹配,但您也可以使用该fnmatch.translate方法从给定的fnmatch模式中生成正则表达式:

>>> import fnmatch
>>> fnmatch.translate('*.txt')
'.*\\.txt\\Z(?ms)'
Run Code Online (Sandbox Code Playgroud)

文档中

fnmatch.translate(pattern)

返回转换为正则表达式的 shell 样式模式。