python脚本以递归方式在FTP中搜索特定的文件名以及24小时以上的文件名

JRM*_*JRM 2 python ftp recursion ftputil

我们的存储区域遇到了SMB连接问题,现在我们被迫定期使用FTP访问文件。因此,我没有使用Bash,而是尝试使用python,但遇到了一些问题。该脚本需要递归搜索FTP目录,并查找所有24小时以后的文件“ * 1700_m30.mp4”。然后在本地复制所有这些文件。

这是到目前为止的内容-但是我似乎无法获取脚本来下载文件或从文件中获取统计信息,这些信息告诉我它们是否比24小时更新。

#!/usr/bin/env python
# encoding: utf-8

import sys
import os
import ftplib
import ftputil
import fnmatch
import time

dir_dest = '/Volumes/VoigtKampff/Temp/TEST1/' # Directory where the files needs to be downloaded to
pattern = '*1700_m30.mp4' #filename pattern for what the script is looking for 
print 'Looking for this pattern :', pattern # print pattern


print "logging into GSP" # print 
host = ftputil.FTPHost('xxx.xxx','xxx','xxxxx') # ftp host info
recursive = host.walk("/GSPstor/xxxxx/xxx/xxx/xxx/xxxx",topdown=True,onerror=None) # recursive search 
for root,dirs,files in recursive:
    for name in files:
        print 'Files   :', files # print all files it finds
        video_list = fnmatch.filter(files, pattern)
        print 'Files to be moved :', video_list # print list of files to be moved 
        if host.path.isfile(video_list): # check whether the file is valid 
            host.download(video_list, video_list, 'b') # download file list 



host.close  
Run Code Online (Sandbox Code Playgroud)

这是基于ottomeister的出色建议(谢谢!)而修改的脚本-现在的最后一个问题是它可以下载,但仍会继续下载文件并覆盖现有文件:

import sys
import os
import ftplib
import ftputil
import fnmatch
import time
from time import mktime
import datetime
import os.path, time 
from ftplib import FTP


dir_dest = '/Volumes/VoigtKampff/Temp/TEST1/' # Directory where the files needs to be downloaded to
pattern = '*1700_m30.mp4' #filename pattern for what the script is looking for 
print 'Looking for this pattern :', pattern # print pattern
utc_datetime_less24H = datetime.datetime.utcnow()-datetime.timedelta(seconds=86400) #UTC time minus 24 hours in seconds
print 'UTC time less than 24 Hours is: ', utc_datetime_less24H.strftime("%Y-%m-%d %H:%M:%S") # print UTC time minus 24 hours in seconds
print "logging into GSP FTP" # print 


with ftputil.FTPHost('xxxxxxxx','xxxxxx','xxxxxx') as host: # ftp host info
    recursive = host.walk("/GSPstor/xxxx/com/xxxx/xxxx/xxxxxx",topdown=True,onerror=None) # recursive search 
    for root,dirs,files in recursive:
        for name in files:
            print 'Files   :', files # print all files it finds
            video_list = fnmatch.filter(files, pattern) # collect all files that match pattern into variable:video_list
            statinfo = host.stat(root, video_list) # get the stats from files in variable:video_list
            file_mtime = datetime.datetime.utcfromtimestamp(statinfo.st_mtime) 
            print 'Files with pattern: %s and epoch mtime is: %s ' % (video_list, statinfo.st_mtime)
            print 'Last Modified: %s' % datetime.datetime.utcfromtimestamp(statinfo.st_mtime) 
            if file_mtime >= utc_datetime_less24H: 
                for fname in video_list:
                    fpath = host.path.join(root, fname)
                    if host.path.isfile(fpath):
                        host.download_if_newer(fpath, os.path.join(dir_dest, fname), 'b') 

host.close()
Run Code Online (Sandbox Code Playgroud)

ott*_*ter 5

这行:

    video_list = fnmatch.filter(files, pattern)
Run Code Online (Sandbox Code Playgroud)

获取与您的glob模式匹配的文件名列表。但是这一行:

    if host.path.isfile(video_list): # check whether the file is valid 
Run Code Online (Sandbox Code Playgroud)

是伪造的,因为host.path.isfile()不希望将文件名列表作为其参数。它需要一个路径名。因此,您需要一次遍历video_list构造一个路径名,将每个路径名传递给host.path.isfile(),然后可能下载该特定文件。像这样:

    import os.path

    for fname in video_list:
        fpath = host.path.join(root, fname)
        if host.path.isfile(fpath):
            host.download(fpath, os.path.join(dir_dest, fname), 'b')
Run Code Online (Sandbox Code Playgroud)

请注意,我host.path.join()用来管理远程路径名和os.path.join()本地路径名。还要注意,这会将所有下载的文件放在一个目录中。如果您想将它们放入反映远程布局的目录层次结构中(如果其他远程目录中的文件名可能发生冲突,则必须执行类似的操作),那么您将需要构造一个不同的目标路径,并且我可能还必须创建本地目标目录层次结构。

要获取时间戳信息,请使用host.lstat()host.stat()取决于您要如何处理符号链接。

是的,应该是host.close()。没有它,连接将在host变量超出范围并被垃圾回收后关闭,但是最好显式关闭它。更好的是,使用with子句来确保连接被关闭,即使异常导致该代码在到达host.close()调用之前被放弃,例如:

    with ftputil.FTPHost('xxx.xxx','xxx','xxxxx') as host: # ftp host info
        recursive = host.walk(...)
        ...
Run Code Online (Sandbox Code Playgroud)