Jas*_*n S 40 python regex glob python-2.7
我希望能够将glob
格式中的模式匹配到字符串列表,而不是文件系统中的实际文件.有没有办法做到这一点,或将glob
模式轻松转换为正则表达式?
Mar*_*ers 30
该glob
模块将fnmatch
模块用于各个路径元素.
这意味着路径被分成目录名和文件名,如果目录名包含元字符(包含任何字符[
,*
或?
),则递归扩展.
如果你有一个简单文件名的字符串列表,那么只需使用该fnmatch.filter()
函数即可:
import fnmatch
matching = fnmatch.filter(filenames, pattern)
Run Code Online (Sandbox Code Playgroud)
但是如果它们包含完整路径,则需要执行更多工作,因为生成的正则表达式不会考虑路径段(通配符不会排除分隔符,也不会针对跨平台路径匹配进行调整).
你可以从路径构造一个简单的trie,然后匹配你的模式:
import fnmatch
import glob
import os.path
from itertools import product
# Cross-Python dictionary views on the keys
if hasattr(dict, 'viewkeys'):
# Python 2
def _viewkeys(d):
return d.viewkeys()
else:
# Python 3
def _viewkeys(d):
return d.keys()
def _in_trie(trie, path):
"""Determine if path is completely in trie"""
current = trie
for elem in path:
try:
current = current[elem]
except KeyError:
return False
return None in current
def find_matching_paths(paths, pattern):
"""Produce a list of paths that match the pattern.
* paths is a list of strings representing filesystem paths
* pattern is a glob pattern as supported by the fnmatch module
"""
if os.altsep: # normalise
pattern = pattern.replace(os.altsep, os.sep)
pattern = pattern.split(os.sep)
# build a trie out of path elements; efficiently search on prefixes
path_trie = {}
for path in paths:
if os.altsep: # normalise
path = path.replace(os.altsep, os.sep)
_, path = os.path.splitdrive(path)
elems = path.split(os.sep)
current = path_trie
for elem in elems:
current = current.setdefault(elem, {})
current.setdefault(None, None) # sentinel
matching = []
current_level = [path_trie]
for subpattern in pattern:
if not glob.has_magic(subpattern):
# plain element, element must be in the trie or there are
# 0 matches
if not any(subpattern in d for d in current_level):
return []
matching.append([subpattern])
current_level = [d[subpattern] for d in current_level if subpattern in d]
else:
# match all next levels in the trie that match the pattern
matched_names = fnmatch.filter({k for d in current_level for k in d}, subpattern)
if not matched_names:
# nothing found
return []
matching.append(matched_names)
current_level = [d[n] for d in current_level for n in _viewkeys(d) & set(matched_names)]
return [os.sep.join(p) for p in product(*matching)
if _in_trie(path_trie, p)]
Run Code Online (Sandbox Code Playgroud)
这一口可以使用路径上的任何地方快速找到匹配:
>>> paths = ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar']
>>> find_matching_paths(paths, '/foo/bar/*')
['/foo/bar/baz', '/foo/bar/bar']
>>> find_matching_paths(paths, '/*/bar/b*')
['/foo/bar/baz', '/foo/bar/bar']
>>> find_matching_paths(paths, '/*/[be]*/b*')
['/foo/bar/baz', '/foo/bar/bar', '/spam/eggs/baz']
Run Code Online (Sandbox Code Playgroud)
Niz*_*med 15
好艺术家复制; 伟大的艺术家偷.
我偷了;)
fnmatch.translate
转换水珠?
和*
对正则表达式.
和.*
分别.我没有调整它.
import re
def glob2re(pat):
"""Translate a shell PATTERN to a regular expression.
There is no way to quote meta-characters.
"""
i, n = 0, len(pat)
res = ''
while i < n:
c = pat[i]
i = i+1
if c == '*':
#res = res + '.*'
res = res + '[^/]*'
elif c == '?':
#res = res + '.'
res = res + '[^/]'
elif c == '[':
j = i
if j < n and pat[j] == '!':
j = j+1
if j < n and pat[j] == ']':
j = j+1
while j < n and pat[j] != ']':
j = j+1
if j >= n:
res = res + '\\['
else:
stuff = pat[i:j].replace('\\','\\\\')
i = j+1
if stuff[0] == '!':
stuff = '^' + stuff[1:]
elif stuff[0] == '^':
stuff = '\\' + stuff
res = '%s[%s]' % (res, stuff)
else:
res = res + re.escape(c)
return res + '\Z(?ms)'
Run Code Online (Sandbox Code Playgroud)
这一点fnmatch.filter
,都是re.match
和re.search
工作.
def glob_filter(names,pat):
return (name for name in names if re.match(glob2re(pat),name))
Run Code Online (Sandbox Code Playgroud)
此页面上的Glob模式和字符串通过测试.
pat_dict = {
'a/b/*/f.txt': ['a/b/c/f.txt', 'a/b/q/f.txt', 'a/b/c/d/f.txt','a/b/c/d/e/f.txt'],
'/foo/bar/*': ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar'],
'/*/bar/b*': ['/foo/bar/baz', '/foo/bar/bar'],
'/*/[be]*/b*': ['/foo/bar/baz', '/foo/bar/bar'],
'/foo*/bar': ['/foolicious/spamfantastic/bar', '/foolicious/bar']
}
for pat in pat_dict:
print('pattern :\t{}\nstrings :\t{}'.format(pat,pat_dict[pat]))
print('matched :\t{}\n'.format(list(glob_filter(pat_dict[pat],pat))))
Run Code Online (Sandbox Code Playgroud)
Vee*_*rac 10
在Python 3.4+上你可以使用PurePath.match
.
pathlib.PurePath(path_string).match(pattern)
Run Code Online (Sandbox Code Playgroud)
在Python 3.3或更早版本(包括2.x)上,pathlib
从PyPI获取.
请注意,以获得独立于平台的结果(这将取决于为什么你运行这个)你想明确说明PurePosixPath
或PureWindowsPath
.
虽然fnmatch.fnmatch
可以直接用于检查模式是否与文件名匹配,但您也可以使用该fnmatch.translate
方法从给定的fnmatch
模式中生成正则表达式:
>>> import fnmatch
>>> fnmatch.translate('*.txt')
'.*\\.txt\\Z(?ms)'
Run Code Online (Sandbox Code Playgroud)
从文档中:
fnmatch.translate(pattern)
返回转换为正则表达式的 shell 样式模式。
归档时间: |
|
查看次数: |
8839 次 |
最近记录: |