Har*_*hna 6 python csv ftp ftplib pandas
我已经连接到FTP并且连接成功。
import ftplib
ftp = ftplib.FTP('***', '****','****')
listoffiles = ftp.dir()
print (listoffiles)
Run Code Online (Sandbox Code Playgroud)
我在此 FTP 中有一些 CSV 文件以及一些包含更多 CSV 的文件夹。
我需要识别此位置(主页)中的文件夹列表,并且需要导航到这些文件夹。我认为cwd命令应该有效。
我还读取了此 FTP 中存储的 CSV。我怎样才能做到这一点?有没有办法直接将 CSV 加载到 Pandas 中?
基于这里的答案(Python write create file direct in FTP)和我自己对ftplib的了解:
您可以执行以下操作:
from ftplib import FTP
import io, pandas
session = FTP('***', '****','****')
# get filenames on ftp home/root
remoteFilenames = session.nlst()
if ".." in remoteFilenames:
remoteFilenames.remove("..")
if "." in remoteFilenames:
remoteFilenames.remove(".")
# iterate over filenames and check which ones are folder
for filename in remoteFilenames:
dirTest = session.nlst(filename)
# This dir test does not work on certain servers
if dirTest and len(dirTest) > 1:
# its a directory => go to directory
session.cwd(filename)
# get filename for on ftp one level deeper
remoteFilenames2 = session.nlst()
if ".." in remoteFilenames2:
remoteFilenames2.remove("..")
if "." in remoteFilenames2:
remoteFilenames2.remove(".")
for filename in remoteFilenames2:
# check again if the filename is a directory and this time ignore it in this case
dirTest = session.nlst(filename)
if dirTest and len(dirTest) > 1:
continue
# download the file but first create a virtual file object for it
download_file = io.BytesIO()
session.retrbinary("RETR {}".format(filename), download_file.write)
download_file.seek(0) # after writing go back to the start of the virtual file
pandas.read_csv(download_file) # read virtual file into pandas
##########
# do your thing with pandas here
##########
download_file.close() # close virtual file
session.quit() # close the ftp session
Run Code Online (Sandbox Code Playgroud)
或者,如果您知道 ftpserver 的结构,您可以循环使用文件夹/文件结构的字典,并通过 ftplib 或 urllib 下载文件,如示例所示:
for folder in {"folder1": ["file1", "file2"], "folder2": ["file1"]}:
for file in folder:
path = "/{}/{}".format(folder, file)
##########
# specific ftp file download stuff here
##########
##########
# do your thing with pandas here
##########
Run Code Online (Sandbox Code Playgroud)
这两种解决方案都可以通过递归或一般支持多级文件夹来优化
| 归档时间: |
|
| 查看次数: |
10855 次 |
| 最近记录: |