在Python中浏览文件和子文件夹

Question

在Python中浏览文件和子文件夹

我想浏览当前文件夹及其所有子文件夹,并获取所有带.htm | .html扩展名的文件.我发现有可能找出一个对象是dir还是这样的文件:

import os

dirList = os.listdir("./") # current directory
for dir in dirList:
  if os.path.isdir(dir) == True:
    # I don't know how to get into this dir and do the same thing here
  else:
    # I got file and i can regexp if it is .htm|html

Run Code Online (Sandbox Code Playgroud)

最后,我想将所有文件及其路径放在一个数组中.有可能吗？

Answer 1

Sve*_*ach 116

您可以使用os.walk()递归遍历目录及其所有子目录:

for root, dirs, files in os.walk(path):
    for name in files:
        if name.endswith((".html", ".htm")):
            # whatever

Run Code Online (Sandbox Code Playgroud)

要构建这些名称的列表,可以使用列表推导:

htmlfiles = [os.path.join(root, name)
             for root, dirs, files in os.walk(path)
             for name in files
             if name.endswith((".html", ".htm"))]

Run Code Online (Sandbox Code Playgroud)

我认为值得一提的一些细微差别是它会遍历/包含隐藏文件，并且这也不会为您解析链接。也不能保证枚举的每个文件/目录都存在（主要是因为链接可以存在，但其目标可能不存在）。[进一步阅读](https://docs.python.org/2.7/library/os.html#os.readlink) 关于解析链接可能对某些人有帮助，这取决于您打算如何使用 `os.walk`。 (3认同)

Answer 2

Pra*_*Das 8

我有类似的事情要做，这就是我的工作方式。

import os

rootdir = os.getcwd()

for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        #print os.path.join(subdir, file)
        filepath = subdir + os.sep + file

        if filepath.endswith(".html"):
            print (filepath)

Run Code Online (Sandbox Code Playgroud)

希望这可以帮助。

Answer 3

Spa*_*pas 6

在 python 3 中，你可以使用 os.scandir()：

for i in os.scandir(path):
    if i.is_file():
        print('File: ' + i.path)
    elif i.is_dir():
        print('Folder: ' + i.path)

Run Code Online (Sandbox Code Playgroud)

Answer 4

小智 5

用于newDirName = os.path.abspath(dir)为子目录创建完整的目录路径名，然后像对父目录所做的那样列出其内容（即newDirList = os.listDir(newDirName)）

您可以为代码片段创建一个单独的方法，并通过子目录结构递归调用它。第一个参数是目录路径名。这对于每个子目录都会改变。

这个答案基于Python库的3.1.1版本文档。Python 3.1.1 库参考（第 10 章 - 文件和目录访问）的第 228 页上有一个很好的模型示例。祝你好运！

归档时间：	14 年，7 月前
查看次数：	92139 次
最近记录：	6 年，8 月前