如何列出目录的所有文件?

duhhunjonn 3474 python directory

如何在Python中列出目录的所有文件并将其添加到list

pycruft.. 3752

os.listdir() 将为您提供目录中的所有内容 - 文件和目录.

如果您想要文件,可以使用以下方法对其进行过滤os.path:

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

或者您可以使用os.walk()哪个会为它访问的每个目录生成两个列表 - 为您分割成文件和目录.如果你只想要顶级目录,你可以在它第一次产生时中断

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

最后,正如该示例所示,将一个列表添加到另一个列表,您可以使用os.listdir()

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

就个人而言,我更喜欢 os.path

  • `f.extend(filenames)`实际上并不等同于`f = f + filenames`.`extend`将在原地修改`f`,而添加在新的内存位置创建一个新列表.这意味着`extend`通常比`+`更有效,但如果多个对象持有对列表的引用,它有时会导致混淆.最后,值得注意的是`f + = filenames`相当于`f.extend(filenames)`,_not_`f = f + filenames`. (136认同)
  • 更简单一点:`(_,_,filenames)= walk(mypath).next()`(如果你确信walk会返回至少一个值,它应该.) (77认同)
  • @misterbee,你的解决方案是最好的,只是一个小改进:``_,_,filenames = next(walk(mypath),(None,None,[]))`` (30认同)
  • 在python 3.x中使用```(_,_,filenames)= next(os.walk(mypath))``` (25认同)
  • 对存储完整路径的轻微修改:对于os.walk(mypath)中的(dirpath,dirnames,filenames):checksum_files.extend(文件名中的文件名的os.path.join(dirpath,filename))break (8认同)
  • 有没有办法使它包含每个文件的完整路径? (2认同)

adamk.. 1507

我更喜欢使用glob模块,因为它模式匹配和扩展.

import glob
print(glob.glob("/home/adam/*.txt"))

它将返回包含查询文件的列表:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

  • 澄清一下,这确实没有回归"完整的道路"; 它只是返回glob的扩展,无论它是什么.例如,给定`/ home/user/foo/bar/hello.txt`,然后,如果在目录`foo`中运行,`glob("bar/*.txt")`将返回`bar/hello.txt` .有些情况下你确实想要完整的(即绝对的)路径; 对于这些情况,请参阅http://stackoverflow.com/questions/51520/how-to-get-an-absolute-file-path-in-python (26认同)
  • 这是listdir + fnmatch的快捷方式http://docs.python.org/library/fnmatch.html#fnmatch.fnmatch (14认同)
  • 没有回答这个问题。`glob.glob(“ *”)`将会。 (5认同)

sepp2k.. 733

import os
os.listdir("somedirectory")

将返回"somedirectory"中所有文件和目录的列表.

  • @Jixiang:`os.listdir()`总是返回_mere文件名_(不是相对路径).`glob.glob()`返回的内容由输入模式的路径格式驱动. (18认同)
  • 与"glob.glob"返回的完整路径相比,这将返回文件的相对路径 (9认同)

Giovanni G. .. 673

获取Python 2和3的文件列表


我也在这里做了一个简短的视频: Python:如何获取目录中的文件列表


os.listdir()

或者.....如何获取当前目录中的所有文件(和目录)(Python 3)

在Python 3中将文件放在当前目录中的最简单方法是这样.这很简单; 使用os.listdir()模块和os函数,你将在该目录中有文件(和目录中的最终文件夹,但你不会在子目录中有文件,因为你可以使用walk - 我将在稍后讨论它).

 import os
 arr = os.listdir()
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

使用glob

我发现glob更容易选择相同类型的文件或共同的东西.请看以下示例:

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

使用列表理解

import glob

mylist = [f for f in glob.glob("*.txt")]

使用os.path.abspath获取完整路径名

如您所知,您在上面的代码中没有该文件的完整路径.如果您需要具有绝对路径,则可以使用所listdir()调用模块的另一个函数glob,将您获得的文件glob作为参数.还有其他方法可以获得完整路径,我们稍后会检查(我更换了,如mexmex所建议的那样,_getfullpathname with glob).

import glob

def filebrowser():
    return [f for f in glob.glob("*")]

x = filebrowser()
print(x)

>>> ['example.txt', 'fb.py', 'filebrowser.py', 'help']

获取所有子目录中的文件类型的完整路径名 glob

我发现这对于在许多目录中查找内容非常有用,它帮助我找到了一个我不记得名字的文件:

import glob

def filebrowser(word=""):
    """Returns a list with all files with the word/extension in it"""
    file = []
    for f in glob.glob("*"):
        if word in f:
            file.append(f)
            return file

flist = filebrowser("example")
print(flist)
flist = filebrowser(".py")
print(flist)

>>> ['example.txt']
>>> ['fb.py', 'filebrowser.py']

os.listdir():获取当前目录中的文件(Python 2)

在Python 2中,如果您想要当前目录中的文件列表,则必须将参数设置为".".或os.listdir方法中的os.getcwd().

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)

 >>> ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

进入目录树

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if ".docx" in file:
            print(os.path.join(r, file))

获取文件:特定目录中的os.listdir()(Python 2和3)

 import os
 arr = os.listdir('.')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

使用os.listdir()获取特定子目录的文件

# Method 1
x = os.listdir('..')

# Method 2
x= os.listdir('/')

os.walk('.') - 当前目录

 import os
 arr = os.listdir('F:\\python')
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

glob模块 - 所有文件

import os

x = os.listdir("./content")

next(os.walk('.'))和os.path.join('dir','file')

 import os
 arr = next(os.walk('.'))[2]
 print(arr)

 >>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

next(os.walk('F:\') - 获取完整路径 - 列表理解

 import os
 arr = []
 for d,r,f in next(os.walk("F:\\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\\_python\\dict_class.py
>>> F:\\_python\\programmi.txt

os.walk - 获取完整路径 - 子目录中的所有文件

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]

 >>> ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.listdir() - 只获取txt文件

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
print(x)

>>> ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

glob - 只获取txt文件

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 print(arr_txt)

 >>> ['work.txt', '3ebooks.txt']

使用glob来获取文件的完整路径

如果我需要文件的绝对路径:

from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\\*.txt")]
for f in x:
    print(f)

>>> F:\acquistionline.txt
>>> F:\acquisti_2018.txt
>>> F:\bootstrap_jquery_ecc.txt

其他使用glob

如果我想要目录中的所有文件:

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

>>> ['a simple game.py', 'data.txt', 'decorator.py']

使用os.path.isfile来避免列表中的目录

import pathlib

flist = []
for p in pathlib.Path('.').iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

 >>> error.PNG
 >>> exemaker.bat
 >>> guiprova.mp3
 >>> setup.py
 >>> speak_gui2.py
 >>> thumb.PNG

使用pathlib(Python 3.4)

flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

如果你想使用列表理解

import pathlib

py = pathlib.Path().glob("*.py")
for file in py:
    print(file)

>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py

*您也可以使用pathlib.Path()而不是pathlib.Path(".")

在pathlib.Path()中使用glob方法

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)
print(y)

>>> ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

输出:

 import os
 x = next(os.walk('F://python'))[2]
 print(x)

 >>> ['calculator.bat','calculator.py']

使用os.walk获取所有和唯一的文件

 import os
 next(os.walk('F://python'))[1] # for the current dir use ('.')

 >>> ['python3','others']

只获取带有next的文件并进入目录

for r,d,f in os.walk("F:\\_python"):
    for dirs in d:
        print(dirs)

>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints

只获取下一个目录并进入目录

import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)

>>> ['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG

获取所有子目录名称 os.path.abspath

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"

print(count("F:\\python"))

>>> 'F:\\\python' : 12057 files'

来自Python 3.5的os.scandir()

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)

def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print('-' * 30)
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")

for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


>>> _compiti18\Compito Contabilità 1\conti.txt
>>> _compiti18\Compito Contabilità 1\modula4.txt
>>> _compiti18\Compito Contabilità 1\moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files

防爆.1:子目录中有多少个文件?

在此示例中,我们查找包含在所有目录及其子目录中的文件数.

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)

例2:如何将目录中的所有文件复制到另一个目录?

一个脚本,用于在计算机中查找所有类型的文件(默认值:pptx)并将其复制到新文件夹中.

"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""

import os

# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
    for root, dirs, files in os.walk("D:\\"):
        for file in files:
            listafile.append(file)
            percorso.append(root + "\\" + file)
            testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
    for file in listafile:
        testo_ordinato.write(file + "\n")

with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
    for file in percorso:
        file_percorso.write(file + "\n")

os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")

防爆.3:如何获取txt文件中的所有文件

如果您要创建包含所有文件名的txt文件:

import os

with open("file.txt", "w", encoding="utf-8") as filewrite:
    for r, d, f in os.walk("C:\\"):
        for file in f:
            filewrite.write(f"{r + file}\n")

示例:txt包含硬盘驱动器的所有文件

import os

def searchfiles(extension='.ttf', folder='H:\\'):
    "Create a txt file with all the file of a type"
    with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
        for r, d, f in os.walk(folder):
            for file in f:
                if file.endswith(extension):
                    filewrite.write(f"{r + file}\n")

# looking for png file (fonts) in the hard disk H:\
searchfiles('.png', 'H:\\')

>>> H:\4bs_18\Dolphins5.png
>>> H:\4bs_18\Dolphins6.png
>>> H:\4bs_18\Dolphins7.png
>>> H:\5_18\marketing html\assets\imageslogo2.png
>>> H:\7z001.png
>>> H:\7z002.png

C:\\的所有文件都在一个文本文件中

这是以前代码的较短版本.如果需要从其他位置开始,请更改文件夹从哪里开始查找文件.此代码在我的计算机上生成一个50 MB的文本文件,其中包含少于500.000行,文件包含完整路径.

import tkinter as tk
import os

def searchfiles(extension='.txt', folder='H:\\'):
    "insert all files in the listbox"
    for r, d, f in os.walk(folder):
        for file in f:
            if file.endswith(extension):
                lb.insert(0, r + "\\" + file)

def open_file():
    os.startfile(lb.get(lb.curselection()[0]))

root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles('.png', 'H:\\'))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()

搜索特定类型文件的功能

 import os
 arr = os.listdir()
 print(arr)

 >>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']


Remi.. 152

获取文件列表(无子目录)的单行解决方案:

filenames = next(os.walk(path))[2]

或绝对路径名:

paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]

  • 如果你已经"导入os",那么只有一个单行.对我来说,似乎不像`glob()`简洁. (7认同)
  • glob的问题是glob('/ home/adam /*.*')会返回一个名为'something.something'的文件夹 (4认同)
  • 在OS X上,有一种叫做bundle的东西.这是一个目录,通常应该被视为一个文件(如.tar).你想要那些被视为文件或目录的人吗?使用`glob()`会将其视为文件.您的方法会将其视为目录. (4认同)

Johnny.. 126

从目录及其所有子目录获取完整文件路径

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • 我在上面的函数中提供的路径包含3个文件 - 其中两个位于根目录中,另一个位于名为"SUBFOLDER"的子文件夹中.您现在可以执行以下操作:
  • print full_file_paths 这将打印列表:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

如果您愿意,可以打开并阅读内容,或只关注扩展名为".dat"的文件,如下面的代码所示:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat


SzieberthAda.. 75

从版本3.4开始,有内置的迭代器,它比os.listdir()以下更有效:

pathlib:版本3.4中的新功能.

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

根据PEP 428,pathlib库的目的是提供一个简单的类层次结构来处理文件系统路径以及用户对它们执行的常见操作.

os.scandir():3.5版中的新功能.

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

请注意,os.walk()使用os.scandir()而不是os.listdir()版本3.5,根据PEP 471,其速度提高了2-20倍.

我还建议您阅读下面的ShadowRanger评论.

  • 注意:`os.scandir`解决方案比使用`os.path.is_file`检查等的`os.listdir`更有效,即使你需要`list`(所以你没有受益来自lazy迭代),因为`os.scandir`使用OS提供的API,在迭代时免费提供`is_file`信息,没有按文件往返磁盘到`stat`它们(在Windows上, `DirEntry`s让你免费完成`stat`信息,在*NIX系统上它需要`stat`以获取超出`is_file`,`is_dir`等的信息,但`DirEntry`缓存在第一个'stat`以方便). (6认同)

CristiFati.. 55

初步说明

  • 虽然问题文本中的文件目录术语之间存在明显区别,但有些人可能认为目录实际上是特殊文件
  • 声明:" 目录的所有文件 "可以用两种方式解释:
    1. 所有直接(或1级)的后代
    2. 整个目录树中的所有后代(包括子目录中的后代)
  • 当问到这个问题时,我认为Python 2LTS版本,但代码示例将由Python 3(.5)运行(我将尽可能保持它们与Python 2兼容;同样,任何代码属于我要发布的Python来自v3.5.4 - 除非另有说明).这会产生与问题中另一个关键字相关的后果:" 将它们添加到列表中 ":

    • Python 2之前的版本中,序列(iterables)主要由列表(元组,集合......)表示
    • In Python 2.2, the concept of generator ([Python.Wiki]: Generators) - courtesy of [Python 3]: The yield statement) - was introduced. As time passed, generator counterparts started to appear for functions that returned/worked with lists
    • In Python 3, generator is the default behavior
    • Not sure if returning a list is still mandatory (or a generator would do as well), but passing a generator to the list constructor, will create a list out of it (and also consume it). The example below illustrates the differences on [Python 3]: map(function, iterable, ...)
    >>> import sys
    >>> sys.version
    '2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])  # Just a dummy lambda function
    >>> m, type(m)
    ([1, 2, 3], <type 'list'>)
    >>> len(m)
    3
    


    >>> import sys
    >>> sys.version
    '3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])
    >>> m, type(m)
    (<map object at 0x000001B4257342B0>, <class 'map'>)
    >>> len(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object of type 'map' has no len()
    >>> lm0 = list(m)  # Build a list from the generator
    >>> lm0, type(lm0)
    ([1, 2, 3], <class 'list'>)
    >>>
    >>> lm1 = list(m)  # Build a list from the same generator
    >>> lm1, type(lm1)  # Empty list now - generator already consumed
    ([], <class 'list'>)
    
  • 这些示例将基于名为root_dir的目录,具有以下结构(此示例适用于Win,但我在Lnx上也使用相同的树):

    E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir"
    Folder PATH listing for volume Work
    Volume serial number is 00000029 3655:6FED
    E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR
    ¦   file0
    ¦   file1
    ¦
    +---dir0
    ¦   +---dir00
    ¦   ¦   ¦   file000
    ¦   ¦   ¦
    ¦   ¦   +---dir000
    ¦   ¦           file0000
    ¦   ¦
    ¦   +---dir01
    ¦   ¦       file010
    ¦   ¦       file011
    ¦   ¦
    ¦   +---dir02
    ¦       +---dir020
    ¦           +---dir0200
    +---dir1
    ¦       file10
    ¦       file11
    ¦       file12
    ¦
    +---dir2
    ¦   ¦   file20
    ¦   ¦
    ¦   +---dir20
    ¦           file200
    ¦
    +---dir3
    


解决方案

程序化方法:

  1. [Python 3]:os.listdir(path ='.')

    返回一个列表,其中包含path给出的目录中的条目名称.该列表是任意顺序,不包括特殊条目'.''..'...


    >>> import os
    >>> root_dir = "root_dir"  # Path relative to current dir (os.getcwd())
    >>>
    >>> os.listdir(root_dir)  # List all the items in root_dir
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))]  # Filter items and only keep files (strip out directories)
    ['file0', 'file1']
    

    一个更详细的例子(code_os_listdir.py):

    import os
    from pprint import pformat
    
    
    def _get_dir_content(path, include_folders, recursive):
        entries = os.listdir(path)
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    yield entry_with_path
                if recursive:
                    for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
                        yield sub_entry
            else:
                yield entry_with_path
    
    
    def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        for item in _get_dir_content(path, include_folders, recursive):
            yield item if prepend_folder_name else item[path_len:]
    
    
    def _get_dir_content_old(path, include_folders, recursive):
        entries = os.listdir(path)
        ret = list()
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    ret.append(entry_with_path)
                if recursive:
                    ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
            else:
                ret.append(entry_with_path)
        return ret
    
    
    def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
    
    
    def main():
        root_dir = "root_dir"
        ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
        lret0 = list(ret0)
        print(ret0, len(lret0), pformat(lret0))
        ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
        print(len(ret1), pformat(ret1))
    
    
    if __name__ == "__main__":
        main()
    

    备注:

    • 有两种实现:
      • 一个使用生成器(当然这里似乎没用,因为我立即将结果转换为列表)
      • 经典之一(函数名以_old结尾)
    • 使用递归(进入子目录)
    • 对于每个实现,有两个功能:
      • 一个以下划线(_)开头的:"私有"(不应该直接调用) - 这样做可以完成所有工作
      • public one(前一个包装器):它只是从返回的条目中剥离初始路径(如果需要).这是一个丑陋的实现,但这是我在这一点上可以带来的唯一想法
    • 在性能方面,生成器通常要快一点(考虑创建迭代时间),但我没有在递归函数中测试它们,而且我在函数内部迭代内部生成器 - 不知道性能如何这是友好的
    • 使用参数来获得不同的结果


    输出:

    (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py"
    <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0',
     'root_dir\\dir0\\dir00',
     'root_dir\\dir0\\dir00\\dir000',
     'root_dir\\dir0\\dir00\\dir000\\file0000',
     'root_dir\\dir0\\dir00\\file000',
     'root_dir\\dir0\\dir01',
     'root_dir\\dir0\\dir01\\file010',
     'root_dir\\dir0\\dir01\\file011',
     'root_dir\\dir0\\dir02',
     'root_dir\\dir0\\dir02\\dir020',
     'root_dir\\dir0\\dir02\\dir020\\dir0200',
     'root_dir\\dir1',
     'root_dir\\dir1\\file10',
     'root_dir\\dir1\\file11',
     'root_dir\\dir1\\file12',
     'root_dir\\dir2',
     'root_dir\\dir2\\dir20',
     'root_dir\\dir2\\dir20\\file200',
     'root_dir\\dir2\\file20',
     'root_dir\\dir3',
     'root_dir\\file0',
     'root_dir\\file1']
    11 ['dir0\\dir00\\dir000\\file0000',
     'dir0\\dir00\\file000',
     'dir0\\dir01\\file010',
     'dir0\\dir01\\file011',
     'dir1\\file10',
     'dir1\\file11',
     'dir1\\file12',
     'dir2\\dir20\\file200',
     'dir2\\file20',
     'file0',
     'file1']
    


  1. [Python 3]:os.scandir(path ='.')(Python 3.5 +,backport:[PyPI]:scandir)

    Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

    Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.


    >>> import os
    >>> root_dir = os.path.join(".", "root_dir")  # Explicitly prepending current directory
    >>> root_dir
    '.\\root_dir'
    >>>
    >>> scandir_iterator = os.scandir(root_dir)
    >>> scandir_iterator
    <nt.ScandirIterator object at 0x00000268CF4BC140>
    >>> [item.path for item in scandir_iterator]
    ['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1']
    >>>
    >>> [item.path for item in scandir_iterator]  # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
    []
    >>>
    >>> scandir_iterator = os.scandir(root_dir)  # Reinitialize the generator
    >>> for item in scandir_iterator :
    ...     if os.path.isfile(item.path):
    ...             print(item.name)
    ...
    file0
    file1
    

    Notes:

    • It's similar to os.listdir
    • But it's also more flexible (and offers more functionality), more Pythonic (and in some cases, faster)


  1. [Python 3]: os.walk(top, topdown=True, onerror=None, followlinks=False)

    Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).


    >>> import os
    >>> root_dir = os.path.join(os.getcwd(), "root_dir")  # Specify the full path
    >>> root_dir
    'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir'
    >>>
    >>> walk_generator = os.walk(root_dir)
    >>> root_dir_entry = next(walk_generator)  # First entry corresponds to the root dir (passed as an argument)
    >>> root_dir_entry
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
    >>>
    >>> root_dir_entry[1] + root_dir_entry[2]  # Display dirs and files (direct descendants) in a single list
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]]  # Display all the entries in the previous list by their full path
    ['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1']
    >>>
    >>> for entry in walk_generator:  # Display the rest of the elements (corresponding to every subdir)
    ...     print(entry)
    ...
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])
    

    Notes:

    • Under the scenes, it uses os.scandir (os.listdir on older versions)
    • It does the heavy lifting by recurring in subfolders


  1. [Python 3]: glob.glob(pathname,*, recursive=False) ([Python 3]: glob.iglob(pathname,*, recursive=False))

    Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).
    ...
    Changed in version 3.5: Support for recursive globs using "**".


    >>> import glob, os
    >>> wildcard_pattern = "*"
    >>> root_dir = os.path.join("root_dir", wildcard_pattern)  # Match every file/dir name
    >>> root_dir
    'root_dir\\*'
    >>>
    >>> glob_list = glob.glob(root_dir)
    >>> glob_list
    ['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
    >>>
    >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list]  # Strip the dir name and the path separator from begining
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> for entry in glob.iglob(root_dir + "*", recursive=True):
    ...     print(entry)
    ...
    root_dir\
    root_dir\dir0
    root_dir\dir0\dir00
    root_dir\dir0\dir00\dir000
    root_dir\dir0\dir00\dir000\file0000
    root_dir\dir0\dir00\file000
    root_dir\dir0\dir01
    root_dir\dir0\dir01\file010
    root_dir\dir0\dir01\file011
    root_dir\dir0\dir02
    root_dir\dir0\dir02\dir020
    root_dir\dir0\dir02\dir020\dir0200
    root_dir\dir1
    root_dir\dir1\file10
    root_dir\dir1\file11
    root_dir\dir1\file12
    root_dir\dir2
    root_dir\dir2\dir20
    root_dir\dir2\dir20\file200
    root_dir\dir2\file20
    root_dir\dir3
    root_dir\file0
    root_dir\file1
    

    Notes:

    • Uses os.listdir
    • For large trees (especially if recursive is on), iglob is preferred
    • Allows advanced filtering based on name (due to the wildcard)


  1. [Python 3]: class pathlib.Path(*pathsegments) (Python 3.4+, backport: [PyPI]: pathlib2)

    >>> import pathlib
    >>> root_dir = "root_dir"
    >>> root_dir_instance = pathlib.Path(root_dir)
    >>> root_dir_instance
    WindowsPath('root_dir')
    >>> root_dir_instance.name
    'root_dir'
    >>> root_dir_instance.is_dir()
    True
    >>>
    >>> [item.name for item in root_dir_instance.glob("*")]  # Wildcard searching for all direct descendants
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()]  # Display paths (including parent) for files only
    ['root_dir\\file0', 'root_dir\\file1']
    

    Notes:

    • This is one way of achieving our goal
    • It's the OOP style of handling paths
    • Offers lots of functionalities


  1. [Python 2]: dircache.listdir(path) (Python 2 only)


    def listdir(path):
        """List directory contents, using cache."""
        try:
            cached_mtime, list = cache[path]
            del cache[path]
        except KeyError:
            cached_mtime, list = -1, []
        mtime = os.stat(path).st_mtime
        if mtime != cached_mtime:
            list = os.listdir(path)
            list.sort()
        cache[path] = mtime, list
        return list
    


  1. [man7]: OPENDIR(3)/[man7]: READDIR(3)/[man7]: CLOSEDIR(3) via [Python 3]: ctypes - A foreign function library for Python (POSIX specific)

    ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

    code_ctypes.py:

    #!/usr/bin/env python3
    
    import sys
    from ctypes import Structure, \
        c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
        CDLL, POINTER, \
        create_string_buffer, get_errno, set_errno, cast
    
    
    DT_DIR = 4
    DT_REG = 8
    
    char256 = c_char * 256
    
    
    class LinuxDirent64(Structure):
        _fields_ = [
            ("d_ino", c_ulonglong),
            ("d_off", c_longlong),
            ("d_reclen", c_ushort),
            ("d_type", c_ubyte),
            ("d_name", char256),
        ]
    
    LinuxDirent64Ptr = POINTER(LinuxDirent64)
    
    libc_dll = this_process = CDLL(None, use_errno=True)
    # ALWAYS set argtypes and restype for functions, otherwise it's UB!!!
    opendir = libc_dll.opendir
    readdir = libc_dll.readdir
    closedir = libc_dll.closedir
    
    
    def get_dir_content(path):
        ret = [path, list(), list()]
        dir_stream = opendir(create_string_buffer(path.encode()))
        if (dir_stream == 0):
            print("opendir returned NULL (errno: {:d})".format(get_errno()))
            return ret
        set_errno(0)
        dirent_addr = readdir(dir_stream)
        while dirent_addr:
            dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
            dirent = dirent_ptr.contents
            name = dirent.d_name.decode()
            if dirent.d_type & DT_DIR:
                if name not in (".", ".."):
                    ret[1].append(name)
            elif dirent.d_type & DT_REG:
                ret[2].append(name)
            dirent_addr = readdir(dir_stream)
        if get_errno():
            print("readdir returned NULL (errno: {:d})".format(get_errno()))
        closedir(dir_stream)
        return ret
    
    
    def main():
        print("{:s} on {:s}\n".format(sys.version, sys.platform))
        root_dir = "root_dir"
        entries = get_dir_content(root_dir)
        print(entries)
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    • It loads the three functions from libc (loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists without exceptions? (@CristiFati's answer) - last notes from item #4.). That would place this approach very close to the Python/C edge
    • LinuxDirent64 is the ctypes representation of struct dirent64 from [man7]: dirent.h(0P) (so are the DT_ constants) from my machine: Ubtu 16 x64 (4.10.0-40-generic and libc6-dev:amd64). On other flavors/versions, the struct definition might differ, and if so, the ctypes alias should be updated, otherwise it will yield Undefined Behavior
    • It returns data in the os.walk's format. I didn't bother to make it recursive, but starting from the existing code, that would be a fairly trivial task
    • Everything is doable on Win as well, the data (libraries, functions, structs, constants, ...) differ


    Output:

    [cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py
    3.5.2 (default, Nov 12 2018, 13:43:14)
    [GCC 5.4.0 20160609] on linux
    
    ['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]
    


  1. [ActiveState]: win32file.FindFilesW (Win specific)

    Retrieves a list of matching filenames, using the Windows Unicode API. An interface to the API FindFirstFileW/FindNextFileW/Find close functions.


    >>> import os, win32file, win32con
    >>> root_dir = "root_dir"
    >>> wildcard = "*"
    >>> root_dir_wildcard = os.path.join(root_dir, wildcard)
    >>> entry_list = win32file.FindFilesW(root_dir_wildcard)
    >>> len(entry_list)  # Don't display the whole content as it's too long
    8
    >>> [entry[-2] for entry in entry_list]  # Only display the entry names
    ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")]  # Filter entries and only display dir names (except self and parent)
    ['dir0', 'dir1', 'dir2', 'dir3']
    >>>
    >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)]  # Only display file "full" names
    ['root_dir\\file0', 'root_dir\\file1']
    

    Notes:


  1. Install some (other) third-party package that does the trick
    • Most likely, will rely on one (or more) of the above (maybe with slight customizations)


Notes:

  • Code is meant to be portable (except places that target a specific area - which are marked) or cross:

    • platform (Nix, Win, )
    • Python version (2, 3, )
  • Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the "tools" used are flexible in this direction

  • os.listdir and os.scandir use opendir/readdir/closedir ([MS.Docs]: FindFirstFileW function/[MS.Docs]: FindNextFileW function/[MS.Docs]: FindClose function) (via [GitHub]: python/cpython - (master) cpython/Modules/posixmodule.c)

  • win32file.FindFilesW uses those (Win specific) functions as well (via [GitHub]: mhammond/pywin32 - (master) pywin32/win32/src/win32file.i)

  • _get_dir_content (from point #1.) can be implemented using any of these approaches (some will require more work and some less)

    • Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument: filter_func=lambda x: True (this doesn't strip out anything) and inside _get_dir_content something like: if not filter_func(entry_with_path): continue (if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute
  • Nota bene! Since recursion is used, I must mention that I did some tests on my laptop (Win 10 x64), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range (recursionlimit - 1000 (default)), I got StackOverflow :). If the directory tree exceeds that limit (I am not an FS expert, so I don't know if that is even possible), that could be a problem.
    I must also mention that I didn't try to increase recursionlimit because I have no experience in the area (how much can I increase it before having to also increase the stack at OS level), but in theory there will always be the possibility for failure, if the dir depth is larger than the highest possible recursionlimit (on that machine)

  • The code samples are for demonstrative purposes only. That means that I didn't take into account error handling (I don't think there's any try/except/else/finally block), so the code is not robust (the reason is: to keep it as simple and short as possible). For production, error handling should be added as well

Other approaches:

  1. Use Python only as a wrapper

    • Everything is done using another technology
    • That technology is invoked from Python
    • The most famous flavor that I know is what I call the system administrator approach:

      • Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs)
      • Some consider this a neat hack
      • I consider it more like a lame workaround (gainarie), as the action per se is performed from shell (cmd in this case), and thus doesn't have anything to do with Python.
      • Filtering (grep/findstr) or output formatting could be done on both sides, but I'm not going to insist on it. Also, I deliberately used os.system instead of subprocess.Popen.
      (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1
      

    In general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention differences between locales).


ArtOfWarfare.. 48

我真的很喜欢adamk的回答,建议您使用glob()同名模块.这允许您与*s 进行模式匹配.

但正如其他人在评论中指出的那样,glob()可能会因不一致的斜线方向而被绊倒.为了解决这个问题,我建议您使用模块中的join()expanduser()函数,也可以使用os.path模块中的getcwd()函数os.

例如:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

上面的内容非常糟糕 - 路径已被硬编码,并且只能在Windows上以驱动器名称和\硬编码到路径之间的方式工作.

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

上面的工作更好,但它依赖Users于Windows上常见的文件夹名称,而在其他操作系统上则不常见.它还依赖于具有特定名称的用户admin.

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

这适用于所有平台.

另一个很好的例子,它可以跨平台完美运行,并且做一些不同的事情

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

希望这些示例可以帮助您了解在标准Python库模块中可以找到的一些函数的强大功能.

  • 额外的全部乐趣:从Python 3.5开始,只要你设置`recursive = True`,`**`就可以了.请参阅此处的文档:https://docs.python.org/3.5/library/glob.html#glob.glob (4认同)

Apogentus.. 35

def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 


Yauhen Yakim.. 23

如果你正在寻找一个find的Python实现,这是我经常使用的一个配方:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

所以我用它制作了一个PyPI ,还有一个GitHub存储库.我希望有人发现它可能对此代码有用.


The2ndSon.. 12

返回绝对文件路径列表,不会递归到子目录中

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

  • 注意:`os.path.abspath(f)`将是一个更便宜的替代`os.path.join(os.getcwd(),f)`. (2认同)

ARGeo.. 12

为了获得更好的结果,您可以使用模块的listdir()方法和os生成器(生成器是一个保持其状态的强大迭代器,还记得吗?).以下代码适用于两个版本:Python 2和Python 3.

这是一个代码:

import os

def files(path):  
    for file in os.listdir(path):
        if os.path.isfile(os.path.join(path, file)):
            yield file

for file in files("."):  
    print (file)

listdir()方法返回给定目录的条目列表.如果给定条目是文件,则os.path.isfile()返回该方法True.并且yield运算符退出func但保持其当前状态,并且仅返回检测为文件的条目的名称.以上所有允许我们循环生成器函数.

希望这可以帮助.


pah8J.. 10

import os
import os.path


def get_files(target_dir):
    item_list = os.listdir(target_dir)

    file_list = list()
    for item in item_list:
        item_dir = os.path.join(target_dir,item)
        if os.path.isdir(item_dir):
            file_list += get_files(item_dir)
        else:
            file_list.append(item_dir)
    return file_list

在这里,我使用递归结构.


fralau.. 8

一位聪明的老师曾经告诉我:

当有几种确定的方法可以做某事时,没有一种方法适合所有情况。

因此,我将为问题的一个子集添加一个解决方案:很多时候,我们只想检查文件是否匹配开始字符串和结束字符串,而无需进入子目录。因此,我们想要一个返回文件名列表的函数,例如:

filenames = dir_filter('foo/baz', radical='radical', extension='.txt')

如果您想先声明两个函数,可以这样做:

def file_filter(filename, radical='', extension=''):
    "Check if a filename matches a radical and extension"
    if not filename:
        return False
    filename = filename.strip()
    return(filename.startswith(radical) and filename.endswith(extension))

def dir_filter(dirname='', radical='', extension=''):
    "Filter filenames in directory according to radical and extension"
    if not dirname:
        dirname = '.'
    return [filename for filename in os.listdir(dirname)
                if file_filter(filename, radical, extension)]

此解决方案可以使用正则表达式轻松进行一般化(pattern如果您不希望模式始终坚持文件名的开头或结尾,则可能需要添加一个参数)。


shantanoo.. 6

使用发电机

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)


归档时间:

查看次数:

3745069 次

最近记录:

9 月,3 周 前