使用 Python 读取存储在 FTP 中的 CSV 文件

Har*_*hna 6 python csv ftp ftplib pandas

我已经连接到FTP并且连接成功。

import ftplib
ftp = ftplib.FTP('***', '****','****')
listoffiles = ftp.dir()
print (listoffiles)
Run Code Online (Sandbox Code Playgroud)

我在此 FTP 中有一些 CSV 文件以及一些包含更多 CSV 的文件夹。

我需要识别此位置(主页)中的文件夹列表,并且需要导航到这些文件夹。我认为cwd命令应该有效。

我还读取了此 FTP 中存储的 CSV。我怎样才能做到这一点?有没有办法直接将 CSV 加载到 Pandas 中?

qua*_*yte 9

基于这里的答案(Python write create file direct in FTP)和我自己对ftplib的了解:

您可以执行以下操作:

from ftplib import FTP
import io, pandas

session = FTP('***', '****','****')

# get filenames on ftp home/root
remoteFilenames = session.nlst()
if ".." in remoteFilenames:
    remoteFilenames.remove("..")
if "." in remoteFilenames:
    remoteFilenames.remove(".")
# iterate over filenames and check which ones are folder
for filename in remoteFilenames:
    dirTest = session.nlst(filename)
    # This dir test does not work on certain servers
    if dirTest and len(dirTest) > 1:
        # its a directory => go to directory
        session.cwd(filename)
        # get filename for on ftp one level deeper
        remoteFilenames2 = session.nlst()
        if ".." in remoteFilenames2:
            remoteFilenames2.remove("..")
        if "." in remoteFilenames2:
            remoteFilenames2.remove(".")
        for filename in remoteFilenames2:
            # check again if the filename is a directory and this time ignore it in this case
            dirTest = session.nlst(filename)
            if dirTest and len(dirTest) > 1:
                continue

            # download the file but first create a virtual file object for it
            download_file = io.BytesIO()
            session.retrbinary("RETR {}".format(filename), download_file.write)
            download_file.seek(0) # after writing go back to the start of the virtual file
            pandas.read_csv(download_file) # read virtual file into pandas
            ##########
            # do your thing with pandas here
            ##########
            download_file.close() # close virtual file

session.quit() # close the ftp session
Run Code Online (Sandbox Code Playgroud)

或者,如果您知道 ftpserver 的结构,您可以循环使用文件夹/文件结构的字典,并通过 ftplib 或 urllib 下载文件,如示例所示:

for folder in {"folder1": ["file1", "file2"], "folder2": ["file1"]}:
    for file in folder:
        path = "/{}/{}".format(folder, file)
        ##########
        # specific ftp file download stuff here
        ##########
        ##########
        # do your thing with pandas here
        ##########
Run Code Online (Sandbox Code Playgroud)

这两种解决方案都可以通过递归或一般支持多级文件夹来优化