使用Python对具有相似文件名的文件进行分组

Question

使用Python对具有相似文件名的文件进行分组

我正在尝试编写一个 python 脚本，在具有相似文件名的目录中查找 pdf 并合并这些 pdf。我想要分组的文件均以相同的 16 个字符开头，但文件名中的日期不同。

所有文件名均采用以下格式：

xxxxxxxxxxxxxxx_01-01-2019.pdf
xxxxxxxxxxxxxxx_02-01-2019.pdf
xxxxxxxxxxxxxxx_03-01_2019.pdf

yyyyyyyyyyyyyyy_01-01-2019.pdf
yyyyyyyyyyyyyyy_02-01-2019.pdf

Python脚本

import glob  
filelist = glob.glob(_filepath_) 

dictionary = {}  
for x in filelist:  
    group = dictionary.get(x[125:141],[])  
    group.append(x)  
    dictionary[x[125:141]] = group

Run Code Online (Sandbox Code Playgroud)

这有点管用。但是，它只为每个类似的文件名返回一个文件：

['xxxxxxxxxxxxxxx_01-01-2019.pdf','yyyyyyyyyyyyyyy_01-01-2019.pdf']

Run Code Online (Sandbox Code Playgroud)

如果我能解决文件的分组问题，合并 pdf 将不再是问题。

Answer 1

Rus*_*iko 5

干得好

filelist = glob.glob(_filepath_) 

dictionary = {}  
for x in filelist:  
    key = x[:16] # The key is the first 16 characters of the file name
    group = dictionary.get(key,[])
    group.append(x)  
    dictionary[key] = group

Run Code Online (Sandbox Code Playgroud)

结果

{
'yyyyyyyyyyyyyyy_': ['yyyyyyyyyyyyyyy_01-01-2019.pdf', 'yyyyyyyyyyyyyyy_02-01-2019.pdf'],
'xxxxxxxxxxxxxxx_': ['xxxxxxxxxxxxxxx_01-01-2019.pdf', 'xxxxxxxxxxxxxxx_02-01-2019.pdf', 'xxxxxxxxxxxxxxx_03-01_2019.pdf']}

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，7 月前
查看次数：	4133 次
最近记录：	6 年，7 月前