如何以有效的方式列出图像序列?Python中的商业序列比较

use*_*686 5 python regex glob

我有一个9图像的目录:

image_0001, image_0002, image_0003
image_0010, image_0011
image_0011-1, image_0011-2, image_0011-3
image_9999

我希望能够以有效的方式列出它们,就像这样(9个图像有4个条目):

(image_000[1-3], image_00[10-11], image_0011-[1-3], image_9999)

在python中有一种方法,以简短/清晰的方式返回图像目录(不列出每个文件)?

所以,可能是这样的:

列出所有图像,按数字排序,创建一个列表(从开始按顺序计算每个图像).缺少图像(创建新列表)时,继续直到原始文件列表完成.现在我应该有一些包含非破坏序列的列表.

我试图让阅读/描述数字列表变得容易.如果我有1000个连续文件的序列它可以清楚地列为文件[0001-1000]而不是文件['0001','0002','0003'等...]

Edit1(基于建议):给定一个扁平列表,你将如何得出glob模式?

Edit2 我试图将问题分解成更小的部分.以下是解决方案的一部分示例:data1工作,data2返回0010为64,data3(realworld数据)不起作用:

# Find runs of consecutive numbers using groupby.  The key to the solution
# is differencing with a range so that consecutive numbers all appear in
# same group.
from operator import itemgetter
from itertools import *

data1=[01,02,03,10,11,100,9999]
data2=[0001,0002,0003,0010,0011,0100,9999]
data3=['image_0001','image_0002','image_0003','image_0010','image_0011','image_0011-2','image_0011-3','image_0100','image_9999']

list1 = []
for k, g in groupby(enumerate(data1), lambda (i,x):i-x):
    list1.append(map(itemgetter(1), g))
print 'data1'
print list1

list2 = []
for k, g in groupby(enumerate(data2), lambda (i,x):i-x):
    list2.append(map(itemgetter(1), g))
print '\ndata2'
print list2
Run Code Online (Sandbox Code Playgroud)

收益:

data1
[[1, 2, 3], [10, 11], [100], [9999]]

data2
[[1, 2, 3], [8, 9], [64], [9999]]
Run Code Online (Sandbox Code Playgroud)

Fré*_*idi 6

以下是您要实现的工作实现,使用您添加的代码作为起点:

#!/usr/bin/env python

import itertools
import re

# This algorithm only works if DATA is sorted.
DATA = ["image_0001", "image_0002", "image_0003",
        "image_0010", "image_0011",
        "image_0011-1", "image_0011-2", "image_0011-3",
        "image_0100", "image_9999"]

def extract_number(name):
    # Match the last number in the name and return it as a string,
    # including leading zeroes (that's important for formatting below).
    return re.findall(r"\d+$", name)[0]

def collapse_group(group):
    if len(group) == 1:
        return group[0][1]  # Unique names collapse to themselves.
    first = extract_number(group[0][1])  # Fetch range
    last = extract_number(group[-1][1])  # of this group.
    # Cheap way to compute the string length of the upper bound,
    # discarding leading zeroes.
    length = len(str(int(last)))
    # Now we have the length of the variable part of the names,
    # the rest is only formatting.
    return "%s[%s-%s]" % (group[0][1][:-length],
        first[-length:], last[-length:])

groups = [collapse_group(tuple(group)) \
    for key, group in itertools.groupby(enumerate(DATA),
        lambda(index, name): index - int(extract_number(name)))]

print groups
Run Code Online (Sandbox Code Playgroud)

打印['image_000[1-3]', 'image_00[10-11]', 'image_0011-[1-3]', 'image_0100', 'image_9999'],这是你想要的.

历史:我最初回答了这个问题,正如@Mark Ransom在下面指出的那样.为了历史,我最初的答案是:

你正在寻找glob.尝试:

import glob
images = glob.glob("image_[0-9]*")
Run Code Online (Sandbox Code Playgroud)

或者,使用您的示例:

images = [glob.glob(pattern) for pattern in ("image_000[1-3]*",
    "image_00[10-11]*", "image_0011-[1-3]*", "image_9999*")]
images = [image for seq in images for image in seq]  # flatten the list
Run Code Online (Sandbox Code Playgroud)