pyc*_*uft 3752
os.listdir()
将为您提供目录中的所有内容 - 文件和目录.
如果您只想要文件,可以使用以下方法对其进行过滤os.path
:
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
Run Code Online (Sandbox Code Playgroud)
或者您可以使用os.walk()
哪个会为它访问的每个目录生成两个列表 - 为您分割成文件和目录.如果你只想要顶级目录,你可以在它第一次产生时中断
from os import walk
f = []
for (dirpath, dirnames, filenames) in walk(mypath):
f.extend(filenames)
break
Run Code Online (Sandbox Code Playgroud)
最后,正如该示例所示,将一个列表添加到另一个列表,您可以使用os.listdir()
或
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
Run Code Online (Sandbox Code Playgroud)
就个人而言,我更喜欢 os.path
ada*_*amk 1507
我更喜欢使用glob
模块,因为它模式匹配和扩展.
import glob
print(glob.glob("/home/adam/*.txt"))
Run Code Online (Sandbox Code Playgroud)
它将返回包含查询文件的列表:
['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]
Run Code Online (Sandbox Code Playgroud)
sep*_*p2k 733
import os
os.listdir("somedirectory")
Run Code Online (Sandbox Code Playgroud)
将返回"somedirectory"中所有文件和目录的列表.
Gio*_* PY 673
我也在这里做了一个简短的视频: Python:如何获取目录中的文件列表
os.listdir()
或者.....如何获取当前目录中的所有文件(和目录)(Python 3)
在Python 3中将文件放在当前目录中的最简单方法是这样.这很简单; 使用os.listdir()
模块和os
函数,你将在该目录中有文件(和目录中的最终文件夹,但你不会在子目录中有文件,因为你可以使用walk - 我将在稍后讨论它).
import os
arr = os.listdir()
print(arr)
>>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
Run Code Online (Sandbox Code Playgroud)
使用glob
我发现glob更容易选择相同类型的文件或共同的东西.请看以下示例:
import glob
txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
Run Code Online (Sandbox Code Playgroud)
使用列表理解
import glob
mylist = [f for f in glob.glob("*.txt")]
Run Code Online (Sandbox Code Playgroud)
如您所知,您在上面的代码中没有该文件的完整路径.如果您需要具有绝对路径,则可以使用所listdir()
调用模块的另一个函数glob
,将您获得的文件glob
作为参数.还有其他方法可以获得完整路径,我们稍后会检查(我更换了,如mexmex所建议的那样,_getfullpathname with glob
).
import glob
def filebrowser():
return [f for f in glob.glob("*")]
x = filebrowser()
print(x)
>>> ['example.txt', 'fb.py', 'filebrowser.py', 'help']
Run Code Online (Sandbox Code Playgroud)
glob
我发现这对于在许多目录中查找内容非常有用,它帮助我找到了一个我不记得名字的文件:
import glob
def filebrowser(word=""):
"""Returns a list with all files with the word/extension in it"""
file = []
for f in glob.glob("*"):
if word in f:
file.append(f)
return file
flist = filebrowser("example")
print(flist)
flist = filebrowser(".py")
print(flist)
>>> ['example.txt']
>>> ['fb.py', 'filebrowser.py']
Run Code Online (Sandbox Code Playgroud)
os.listdir():获取当前目录中的文件(Python 2)
在Python 2中,如果您想要当前目录中的文件列表,则必须将参数设置为".".或os.listdir方法中的os.getcwd().
import os
files_path = [os.path.abspath(x) for x in os.listdir()]
print(files_path)
>>> ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']
Run Code Online (Sandbox Code Playgroud)
import os
# Getting the current work directory (cwd)
thisdir = os.getcwd()
# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
for file in f:
if ".docx" in file:
print(os.path.join(r, file))
Run Code Online (Sandbox Code Playgroud)
import os
arr = os.listdir('.')
print(arr)
>>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
Run Code Online (Sandbox Code Playgroud)
# Method 1
x = os.listdir('..')
# Method 2
x= os.listdir('/')
Run Code Online (Sandbox Code Playgroud)
import os
arr = os.listdir('F:\\python')
print(arr)
>>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
Run Code Online (Sandbox Code Playgroud)
import os
x = os.listdir("./content")
Run Code Online (Sandbox Code Playgroud)
import os
arr = next(os.walk('.'))[2]
print(arr)
>>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']
Run Code Online (Sandbox Code Playgroud)
import os
arr = []
for d,r,f in next(os.walk("F:\\_python")):
for file in f:
arr.append(os.path.join(r,file))
for f in arr:
print(files)
>>> F:\\_python\\dict_class.py
>>> F:\\_python\\programmi.txt
Run Code Online (Sandbox Code Playgroud)
os.walk - 获取完整路径 - 子目录中的所有文件
[os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]
>>> ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']
Run Code Online (Sandbox Code Playgroud)
x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
print(x)
>>> ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']
Run Code Online (Sandbox Code Playgroud)
arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
print(arr_txt)
>>> ['work.txt', '3ebooks.txt']
Run Code Online (Sandbox Code Playgroud)
如果我需要文件的绝对路径:
from path import path
from glob import glob
x = [path(f).abspath() for f in glob("F:\\*.txt")]
for f in x:
print(f)
>>> F:\acquistionline.txt
>>> F:\acquisti_2018.txt
>>> F:\bootstrap_jquery_ecc.txt
Run Code Online (Sandbox Code Playgroud)
如果我想要目录中的所有文件:
import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)
>>> ['a simple game.py', 'data.txt', 'decorator.py']
Run Code Online (Sandbox Code Playgroud)
import pathlib
flist = []
for p in pathlib.Path('.').iterdir():
if p.is_file():
print(p)
flist.append(p)
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speak_gui2.py
>>> thumb.PNG
Run Code Online (Sandbox Code Playgroud)
flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]
Run Code Online (Sandbox Code Playgroud)
如果你想使用列表理解
import pathlib
py = pathlib.Path().glob("*.py")
for file in py:
print(file)
>>> stack_overflow_list.py
>>> stack_overflow_list_tkinter.py
Run Code Online (Sandbox Code Playgroud)
*您也可以使用pathlib.Path()而不是pathlib.Path(".")
import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
for f in t:
y.append(f)
print(y)
>>> ['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']
Run Code Online (Sandbox Code Playgroud)
输出:
import os
x = next(os.walk('F://python'))[2]
print(x)
>>> ['calculator.bat','calculator.py']
Run Code Online (Sandbox Code Playgroud)
import os
next(os.walk('F://python'))[1] # for the current dir use ('.')
>>> ['python3','others']
Run Code Online (Sandbox Code Playgroud)
for r,d,f in os.walk("F:\\_python"):
for dirs in d:
print(dirs)
>>> .vscode
>>> pyexcel
>>> pyschool.py
>>> subtitles
>>> _metaprogramming
>>> .ipynb_checkpoints
Run Code Online (Sandbox Code Playgroud)
import os
x = [f.name for f in os.scandir() if f.is_file()]
print(x)
>>> ['calculator.bat','calculator.py']
# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir.
# In this case, it shows the files only in the current directory
# where the script is executed.
import os
with os.scandir() as i:
for entry in i:
if entry.is_file():
print(entry.name)
>>> ebookmaker.py
>>> error.PNG
>>> exemaker.bat
>>> guiprova.mp3
>>> setup.py
>>> speakgui4.py
>>> speak_gui2.py
>>> speak_gui3.py
>>> thumb.PNG
Run Code Online (Sandbox Code Playgroud)
os.path.abspath
import os
def count(dir, counter=0):
"returns number of files in dir and subdirs"
for pack in os.walk(dir):
for f in pack[2]:
counter += 1
return dir + " : " + str(counter) + "files"
print(count("F:\\python"))
>>> 'F:\\\python' : 12057 files'
Run Code Online (Sandbox Code Playgroud)
import os
import shutil
from path import path
destination = "F:\\file_copied"
# os.makedirs(destination)
def copyfile(dir, filetype='pptx', counter=0):
"Searches for pptx (or other - pptx is the default) files and copies them"
for pack in os.walk(dir):
for f in pack[2]:
if f.endswith(filetype):
fullpath = pack[0] + "\\" + f
print(fullpath)
shutil.copy(fullpath, destination)
counter += 1
if counter > 0:
print('-' * 30)
print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")
for dir in os.listdir():
"searches for folders that starts with `_`"
if dir[0] == '_':
# copyfile(dir, filetype='pdf')
copyfile(dir, filetype='txt')
>>> _compiti18\Compito Contabilità 1\conti.txt
>>> _compiti18\Compito Contabilità 1\modula4.txt
>>> _compiti18\Compito Contabilità 1\moduloa4.txt
>>> ------------------------
>>> ==> Found in: `_compiti18` : 3 files
Run Code Online (Sandbox Code Playgroud)
在此示例中,我们查找包含在所有目录及其子目录中的文件数.
import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
for eachfile in os.listdir():
mylist += eachfile + "\n"
file.write(mylist)
Run Code Online (Sandbox Code Playgroud)
一个脚本,用于在计算机中查找所有类型的文件(默认值:pptx)并将其复制到新文件夹中.
"""
We are going to save a txt file with all the files in your directory.
We will use the function walk()
"""
import os
# see all the methods of os
# print(*dir(os), sep=", ")
listafile = []
percorso = []
with open("lista_file.txt", "w", encoding='utf-8') as testo:
for root, dirs, files in os.walk("D:\\"):
for file in files:
listafile.append(file)
percorso.append(root + "\\" + file)
testo.write(file + "\n")
listafile.sort()
print("N. of files", len(listafile))
with open("lista_file_ordinata.txt", "w", encoding="utf-8") as testo_ordinato:
for file in listafile:
testo_ordinato.write(file + "\n")
with open("percorso.txt", "w", encoding="utf-8") as file_percorso:
for file in percorso:
file_percorso.write(file + "\n")
os.system("lista_file.txt")
os.system("lista_file_ordinata.txt")
os.system("percorso.txt")
Run Code Online (Sandbox Code Playgroud)
如果您要创建包含所有文件名的txt文件:
import os
with open("file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk("C:\\"):
for file in f:
filewrite.write(f"{r + file}\n")
Run Code Online (Sandbox Code Playgroud)
import os
def searchfiles(extension='.ttf', folder='H:\\'):
"Create a txt file with all the file of a type"
with open(extension[1:] + "file.txt", "w", encoding="utf-8") as filewrite:
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
filewrite.write(f"{r + file}\n")
# looking for png file (fonts) in the hard disk H:\
searchfiles('.png', 'H:\\')
>>> H:\4bs_18\Dolphins5.png
>>> H:\4bs_18\Dolphins6.png
>>> H:\4bs_18\Dolphins7.png
>>> H:\5_18\marketing html\assets\imageslogo2.png
>>> H:\7z001.png
>>> H:\7z002.png
Run Code Online (Sandbox Code Playgroud)
这是以前代码的较短版本.如果需要从其他位置开始,请更改文件夹从哪里开始查找文件.此代码在我的计算机上生成一个50 MB的文本文件,其中包含少于500.000行,文件包含完整路径.
import tkinter as tk
import os
def searchfiles(extension='.txt', folder='H:\\'):
"insert all files in the listbox"
for r, d, f in os.walk(folder):
for file in f:
if file.endswith(extension):
lb.insert(0, r + "\\" + file)
def open_file():
os.startfile(lb.get(lb.curselection()[0]))
root = tk.Tk()
root.geometry("400x400")
bt = tk.Button(root, text="Search", command=lambda:searchfiles('.png', 'H:\\'))
bt.pack()
lb = tk.Listbox(root)
lb.pack(fill="both", expand=1)
lb.bind("<Double-Button>", lambda x: open_file())
root.mainloop()
Run Code Online (Sandbox Code Playgroud)
import os
arr = os.listdir()
print(arr)
>>> ['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']
Run Code Online (Sandbox Code Playgroud)
Rem*_*emi 152
只获取文件列表(无子目录)的单行解决方案:
filenames = next(os.walk(path))[2]
Run Code Online (Sandbox Code Playgroud)
或绝对路径名:
paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]
Run Code Online (Sandbox Code Playgroud)
Joh*_*nny 126
从目录及其所有子目录获取完整文件路径
import os
def get_filepaths(directory):
"""
This function will generate the file names in a directory
tree by walking the tree either top-down or bottom-up. For each
directory in the tree rooted at directory top (including top itself),
it yields a 3-tuple (dirpath, dirnames, filenames).
"""
file_paths = [] # List which will store all of the full filepaths.
# Walk the tree.
for root, directories, files in os.walk(directory):
for filename in files:
# Join the two strings in order to form the full filepath.
filepath = os.path.join(root, filename)
file_paths.append(filepath) # Add it to the list.
return file_paths # Self-explanatory.
# Run the above function and store its results in a variable.
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")
Run Code Online (Sandbox Code Playgroud)
print full_file_paths
这将打印列表:
['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']
如果您愿意,可以打开并阅读内容,或只关注扩展名为".dat"的文件,如下面的代码所示:
for f in full_file_paths:
if f.endswith(".dat"):
print f
Run Code Online (Sandbox Code Playgroud)
/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat
Szi*_*dam 75
从版本3.4开始,有内置的迭代器,它比os.listdir()
以下更有效:
pathlib
:版本3.4中的新功能.
>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]
Run Code Online (Sandbox Code Playgroud)
根据PEP 428,pathlib
库的目的是提供一个简单的类层次结构来处理文件系统路径以及用户对它们执行的常见操作.
os.scandir()
:3.5版中的新功能.
>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]
Run Code Online (Sandbox Code Playgroud)
请注意,os.walk()
使用os.scandir()
而不是os.listdir()
版本3.5,根据PEP 471,其速度提高了2-20倍.
我还建议您阅读下面的ShadowRanger评论.
Cri*_*ati 55
当问到这个问题时,我认为Python 2是LTS版本,但代码示例将由Python 3(.5)运行(我将尽可能保持它们与Python 2兼容;同样,任何代码属于我要发布的Python来自v3.5.4 - 除非另有说明).这会产生与问题中另一个关键字相关的后果:" 将它们添加到列表中 ":
Run Code Online (Sandbox Code Playgroud)>>> import sys >>> sys.version '2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]' >>> m = map(lambda x: x, [1, 2, 3]) # Just a dummy lambda function >>> m, type(m) ([1, 2, 3], <type 'list'>) >>> len(m) 3
Run Code Online (Sandbox Code Playgroud)>>> import sys >>> sys.version '3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]' >>> m = map(lambda x: x, [1, 2, 3]) >>> m, type(m) (<map object at 0x000001B4257342B0>, <class 'map'>) >>> len(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'map' has no len() >>> lm0 = list(m) # Build a list from the generator >>> lm0, type(lm0) ([1, 2, 3], <class 'list'>) >>> >>> lm1 = list(m) # Build a list from the same generator >>> lm1, type(lm1) # Empty list now - generator already consumed ([], <class 'list'>)
这些示例将基于名为root_dir的目录,具有以下结构(此示例适用于Win,但我在Lnx上也使用相同的树):
Run Code Online (Sandbox Code Playgroud)E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir" Folder PATH listing for volume Work Volume serial number is 00000029 3655:6FED E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR ¦ file0 ¦ file1 ¦ +---dir0 ¦ +---dir00 ¦ ¦ ¦ file000 ¦ ¦ ¦ ¦ ¦ +---dir000 ¦ ¦ file0000 ¦ ¦ ¦ +---dir01 ¦ ¦ file010 ¦ ¦ file011 ¦ ¦ ¦ +---dir02 ¦ +---dir020 ¦ +---dir0200 +---dir1 ¦ file10 ¦ file11 ¦ file12 ¦ +---dir2 ¦ ¦ file20 ¦ ¦ ¦ +---dir20 ¦ file200 ¦ +---dir3
[Python 3]:os.listdir(path ='.')
返回一个列表,其中包含path给出的目录中的条目名称.该列表是任意顺序,不包括特殊条目
'.'
和'..'
...
Run Code Online (Sandbox Code Playgroud)>>> import os >>> root_dir = "root_dir" # Path relative to current dir (os.getcwd()) >>> >>> os.listdir(root_dir) # List all the items in root_dir ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))] # Filter items and only keep files (strip out directories) ['file0', 'file1']
一个更详细的例子(code_os_listdir.py):
import os
from pprint import pformat
def _get_dir_content(path, include_folders, recursive):
entries = os.listdir(path)
for entry in entries:
entry_with_path = os.path.join(path, entry)
if os.path.isdir(entry_with_path):
if include_folders:
yield entry_with_path
if recursive:
for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
yield sub_entry
else:
yield entry_with_path
def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
path_len = len(path) + len(os.path.sep)
for item in _get_dir_content(path, include_folders, recursive):
yield item if prepend_folder_name else item[path_len:]
def _get_dir_content_old(path, include_folders, recursive):
entries = os.listdir(path)
ret = list()
for entry in entries:
entry_with_path = os.path.join(path, entry)
if os.path.isdir(entry_with_path):
if include_folders:
ret.append(entry_with_path)
if recursive:
ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
else:
ret.append(entry_with_path)
return ret
def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
path_len = len(path) + len(os.path.sep)
return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
def main():
root_dir = "root_dir"
ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
lret0 = list(ret0)
print(ret0, len(lret0), pformat(lret0))
ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
print(len(ret1), pformat(ret1))
if __name__ == "__main__":
main()
Run Code Online (Sandbox Code Playgroud)
备注:
输出:
Run Code Online (Sandbox Code Playgroud)(py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py" <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0', 'root_dir\\dir0\\dir00', 'root_dir\\dir0\\dir00\\dir000', 'root_dir\\dir0\\dir00\\dir000\\file0000', 'root_dir\\dir0\\dir00\\file000', 'root_dir\\dir0\\dir01', 'root_dir\\dir0\\dir01\\file010', 'root_dir\\dir0\\dir01\\file011', 'root_dir\\dir0\\dir02', 'root_dir\\dir0\\dir02\\dir020', 'root_dir\\dir0\\dir02\\dir020\\dir0200', 'root_dir\\dir1', 'root_dir\\dir1\\file10', 'root_dir\\dir1\\file11', 'root_dir\\dir1\\file12', 'root_dir\\dir2', 'root_dir\\dir2\\dir20', 'root_dir\\dir2\\dir20\\file200', 'root_dir\\dir2\\file20', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1'] 11 ['dir0\\dir00\\dir000\\file0000', 'dir0\\dir00\\file000', 'dir0\\dir01\\file010', 'dir0\\dir01\\file011', 'dir1\\file10', 'dir1\\file11', 'dir1\\file12', 'dir2\\dir20\\file200', 'dir2\\file20', 'file0', 'file1']
[Python 3]:os.scandir(path ='.')(Python 3.5 +,backport:[PyPI]:scandir)
Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries
'.'
and'..'
are not included.Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.
Run Code Online (Sandbox Code Playgroud)>>> import os >>> root_dir = os.path.join(".", "root_dir") # Explicitly prepending current directory >>> root_dir '.\\root_dir' >>> >>> scandir_iterator = os.scandir(root_dir) >>> scandir_iterator <nt.ScandirIterator object at 0x00000268CF4BC140> >>> [item.path for item in scandir_iterator] ['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1'] >>> >>> [item.path for item in scandir_iterator] # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension) [] >>> >>> scandir_iterator = os.scandir(root_dir) # Reinitialize the generator >>> for item in scandir_iterator : ... if os.path.isfile(item.path): ... print(item.name) ... file0 file1
Notes:
os.listdir
[Python 3]: os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (
dirpath
,dirnames
,filenames
).
Run Code Online (Sandbox Code Playgroud)>>> import os >>> root_dir = os.path.join(os.getcwd(), "root_dir") # Specify the full path >>> root_dir 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir' >>> >>> walk_generator = os.walk(root_dir) >>> root_dir_entry = next(walk_generator) # First entry corresponds to the root dir (passed as an argument) >>> root_dir_entry ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1']) >>> >>> root_dir_entry[1] + root_dir_entry[2] # Display dirs and files (direct descendants) in a single list ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]] # Display all the entries in the previous list by their full path ['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1'] >>> >>> for entry in walk_generator: # Display the rest of the elements (corresponding to every subdir) ... print(entry) ... ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], []) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], []) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], []) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], []) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200']) ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])
Notes:
os.scandir
(os.listdir
on older versions)[Python 3]: glob.glob(pathname,*, recursive=False) ([Python 3]: glob.iglob(pathname,*, recursive=False))
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like
/usr/src/Python-1.5/Makefile
) or relative (like../../Tools/*/*.gif
), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).
...
Changed in version 3.5: Support for recursive globs using "**
".
Run Code Online (Sandbox Code Playgroud)>>> import glob, os >>> wildcard_pattern = "*" >>> root_dir = os.path.join("root_dir", wildcard_pattern) # Match every file/dir name >>> root_dir 'root_dir\\*' >>> >>> glob_list = glob.glob(root_dir) >>> glob_list ['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1'] >>> >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list] # Strip the dir name and the path separator from begining ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> for entry in glob.iglob(root_dir + "*", recursive=True): ... print(entry) ... root_dir\ root_dir\dir0 root_dir\dir0\dir00 root_dir\dir0\dir00\dir000 root_dir\dir0\dir00\dir000\file0000 root_dir\dir0\dir00\file000 root_dir\dir0\dir01 root_dir\dir0\dir01\file010 root_dir\dir0\dir01\file011 root_dir\dir0\dir02 root_dir\dir0\dir02\dir020 root_dir\dir0\dir02\dir020\dir0200 root_dir\dir1 root_dir\dir1\file10 root_dir\dir1\file11 root_dir\dir1\file12 root_dir\dir2 root_dir\dir2\dir20 root_dir\dir2\dir20\file200 root_dir\dir2\file20 root_dir\dir3 root_dir\file0 root_dir\file1
Notes:
os.listdir
[Python 3]: class pathlib.Path(*pathsegments) (Python 3.4+, backport: [PyPI]: pathlib2)
Run Code Online (Sandbox Code Playgroud)>>> import pathlib >>> root_dir = "root_dir" >>> root_dir_instance = pathlib.Path(root_dir) >>> root_dir_instance WindowsPath('root_dir') >>> root_dir_instance.name 'root_dir' >>> root_dir_instance.is_dir() True >>> >>> [item.name for item in root_dir_instance.glob("*")] # Wildcard searching for all direct descendants ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()] # Display paths (including parent) for files only ['root_dir\\file0', 'root_dir\\file1']
Notes:
[Python 2]: dircache.listdir(path) (Python 2 only)
os.listdir
with cachingdef listdir(path):
"""List directory contents, using cache."""
try:
cached_mtime, list = cache[path]
del cache[path]
except KeyError:
cached_mtime, list = -1, []
mtime = os.stat(path).st_mtime
if mtime != cached_mtime:
list = os.listdir(path)
list.sort()
cache[path] = mtime, list
return list
Run Code Online (Sandbox Code Playgroud)[man7]: OPENDIR(3)/[man7]: READDIR(3)/[man7]: CLOSEDIR(3) via [Python 3]: ctypes - A foreign function library for Python (POSIX specific)
ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.
code_ctypes.py:
#!/usr/bin/env python3
import sys
from ctypes import Structure, \
c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
CDLL, POINTER, \
create_string_buffer, get_errno, set_errno, cast
DT_DIR = 4
DT_REG = 8
char256 = c_char * 256
class LinuxDirent64(Structure):
_fields_ = [
("d_ino", c_ulonglong),
("d_off", c_longlong),
("d_reclen", c_ushort),
("d_type", c_ubyte),
("d_name", char256),
]
LinuxDirent64Ptr = POINTER(LinuxDirent64)
libc_dll = this_process = CDLL(None, use_errno=True)
# ALWAYS set argtypes and restype for functions, otherwise it's UB!!!
opendir = libc_dll.opendir
readdir = libc_dll.readdir
closedir = libc_dll.closedir
def get_dir_content(path):
ret = [path, list(), list()]
dir_stream = opendir(create_string_buffer(path.encode()))
if (dir_stream == 0):
print("opendir returned NULL (errno: {:d})".format(get_errno()))
return ret
set_errno(0)
dirent_addr = readdir(dir_stream)
while dirent_addr:
dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
dirent = dirent_ptr.contents
name = dirent.d_name.decode()
if dirent.d_type & DT_DIR:
if name not in (".", ".."):
ret[1].append(name)
elif dirent.d_type & DT_REG:
ret[2].append(name)
dirent_addr = readdir(dir_stream)
if get_errno():
print("readdir returned NULL (errno: {:d})".format(get_errno()))
closedir(dir_stream)
return ret
def main():
print("{:s} on {:s}\n".format(sys.version, sys.platform))
root_dir = "root_dir"
entries = get_dir_content(root_dir)
print(entries)
if __name__ == "__main__":
main()
Run Code Online (Sandbox Code Playgroud)
Notes:
os.walk
's format. I didn't bother to make it recursive, but starting from the existing code, that would be a fairly trivial taskOutput:
Run Code Online (Sandbox Code Playgroud)[cfati@cfati-ubtu16x64-0:~/Work/Dev/StackOverflow/q003207219]> ./code_ctypes.py 3.5.2 (default, Nov 12 2018, 13:43:14) [GCC 5.4.0 20160609] on linux ['root_dir', ['dir2', 'dir1', 'dir3', 'dir0'], ['file1', 'file0']]
[ActiveState]: win32file.FindFilesW (Win specific)
Retrieves a list of matching filenames, using the Windows Unicode API. An interface to the API FindFirstFileW/FindNextFileW/Find close functions.
Run Code Online (Sandbox Code Playgroud)>>> import os, win32file, win32con >>> root_dir = "root_dir" >>> wildcard = "*" >>> root_dir_wildcard = os.path.join(root_dir, wildcard) >>> entry_list = win32file.FindFilesW(root_dir_wildcard) >>> len(entry_list) # Don't display the whole content as it's too long 8 >>> [entry[-2] for entry in entry_list] # Only display the entry names ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1'] >>> >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")] # Filter entries and only display dir names (except self and parent) ['dir0', 'dir1', 'dir2', 'dir3'] >>> >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)] # Only display file "full" names ['root_dir\\file0', 'root_dir\\file1']
Notes:
win32file.FindFilesW
is part of [GitHub]: mhammond/pywin32 - Python for Windows (pywin32) Extensions, which is a Python wrapper over WINAPIsNotes:
Code is meant to be portable (except places that target a specific area - which are marked) or cross:
Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the "tools" used are flexible in this direction
os.listdir
and os.scandir
use opendir/readdir/closedir ([MS.Docs]: FindFirstFileW function/[MS.Docs]: FindNextFileW function/[MS.Docs]: FindClose function) (via [GitHub]: python/cpython - (master) cpython/Modules/posixmodule.c)
win32file.FindFilesW
uses those (Win specific) functions as well (via [GitHub]: mhammond/pywin32 - (master) pywin32/win32/src/win32file.i)
_get_dir_content (from point #1.) can be implemented using any of these approaches (some will require more work and some less)
filter_func=lambda x: True
(this doesn't strip out anything) and inside _get_dir_content something like: if not filter_func(entry_with_path): continue
(if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to executeNota bene! Since recursion is used, I must mention that I did some tests on my laptop (Win 10 x64), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range (recursionlimit - 1000 (default)), I got StackOverflow :). If the directory tree exceeds that limit (I am not an FS expert, so I don't know if that is even possible), that could be a problem.
I must also mention that I didn't try to increase recursionlimit because I have no experience in the area (how much can I increase it before having to also increase the stack at OS level), but in theory there will always be the possibility for failure, if the dir depth is larger than the highest possible recursionlimit (on that machine)
The code samples are for demonstrative purposes only. That means that I didn't take into account error handling (I don't think there's any try/except/else/finally block), so the code is not robust (the reason is: to keep it as simple and short as possible). For production, error handling should be added as well
Use Python only as a wrapper
The most famous flavor that I know is what I call the system administrator approach:
grep
/findstr
) or output formatting could be done on both sides, but I'm not going to insist on it. Also, I deliberately used os.system
instead of subprocess.Popen
.Run Code Online (Sandbox Code Playgroud)(py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")" dir0 dir1 dir2 dir3 file0 file1
In general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention differences between locales).
Art*_*are 48
我真的很喜欢adamk的回答,建议您使用glob()
同名模块.这允许您与*
s 进行模式匹配.
但正如其他人在评论中指出的那样,glob()
可能会因不一致的斜线方向而被绊倒.为了解决这个问题,我建议您使用模块中的join()
和expanduser()
函数,也可以使用os.path
模块中的getcwd()
函数os
.
例如:
from glob import glob
# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')
Run Code Online (Sandbox Code Playgroud)
上面的内容非常糟糕 - 路径已被硬编码,并且只能在Windows上以驱动器名称和\
硬编码到路径之间的方式工作.
from glob import glob
from os.path import join
# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))
Run Code Online (Sandbox Code Playgroud)
上面的工作更好,但它依赖Users
于Windows上常见的文件夹名称,而在其他操作系统上则不常见.它还依赖于具有特定名称的用户admin
.
from glob import glob
from os.path import expanduser, join
# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))
Run Code Online (Sandbox Code Playgroud)
这适用于所有平台.
另一个很好的例子,它可以跨平台完美运行,并且做一些不同的事情
from glob import glob
from os import getcwd
from os.path import join
# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))
Run Code Online (Sandbox Code Playgroud)
希望这些示例可以帮助您了解在标准Python库模块中可以找到的一些函数的强大功能.
Apo*_*tus 35
def list_files(path):
# returns a list of names (with extension, without full path) of all files
# in folder path
files = []
for name in os.listdir(path):
if os.path.isfile(os.path.join(path, name)):
files.append(name)
return files
Run Code Online (Sandbox Code Playgroud)
Yau*_*ich 23
如果你正在寻找一个find的Python实现,这是我经常使用的一个配方:
from findtools.find_files import (find_files, Match)
# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)
for found_file in found_files:
print found_file
Run Code Online (Sandbox Code Playgroud)
所以我用它制作了一个PyPI 包,还有一个GitHub存储库.我希望有人发现它可能对此代码有用.
The*_*Son 12
返回绝对文件路径列表,不会递归到子目录中
L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]
Run Code Online (Sandbox Code Playgroud)
ARG*_*Geo 12
为了获得更好的结果,您可以使用模块的
listdir()
方法和os
生成器(生成器是一个保持其状态的强大迭代器,还记得吗?).以下代码适用于两个版本:Python 2和Python 3.
这是一个代码:
import os
def files(path):
for file in os.listdir(path):
if os.path.isfile(os.path.join(path, file)):
yield file
for file in files("."):
print (file)
Run Code Online (Sandbox Code Playgroud)
该listdir()
方法返回给定目录的条目列表.如果给定条目是文件,则os.path.isfile()
返回该方法True
.并且yield
运算符退出func但保持其当前状态,并且仅返回检测为文件的条目的名称.以上所有允许我们循环生成器函数.
希望这可以帮助.
pah*_*h8J 10
import os
import os.path
def get_files(target_dir):
item_list = os.listdir(target_dir)
file_list = list()
for item in item_list:
item_dir = os.path.join(target_dir,item)
if os.path.isdir(item_dir):
file_list += get_files(item_dir)
else:
file_list.append(item_dir)
return file_list
Run Code Online (Sandbox Code Playgroud)
在这里,我使用递归结构.
一位聪明的老师曾经告诉我:
当有几种确定的方法可以做某事时,没有一种方法适合所有情况。
因此,我将为问题的一个子集添加一个解决方案:很多时候,我们只想检查文件是否匹配开始字符串和结束字符串,而无需进入子目录。因此,我们想要一个返回文件名列表的函数,例如:
filenames = dir_filter('foo/baz', radical='radical', extension='.txt')
Run Code Online (Sandbox Code Playgroud)
如果您想先声明两个函数,可以这样做:
def file_filter(filename, radical='', extension=''):
"Check if a filename matches a radical and extension"
if not filename:
return False
filename = filename.strip()
return(filename.startswith(radical) and filename.endswith(extension))
def dir_filter(dirname='', radical='', extension=''):
"Filter filenames in directory according to radical and extension"
if not dirname:
dirname = '.'
return [filename for filename in os.listdir(dirname)
if file_filter(filename, radical, extension)]
Run Code Online (Sandbox Code Playgroud)
此解决方案可以使用正则表达式轻松进行一般化(pattern
如果您不希望模式始终坚持文件名的开头或结尾,则可能需要添加一个参数)。
使用发电机
import os
def get_files(search_path):
for (dirpath, _, filenames) in os.walk(search_path):
for filename in filenames:
yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
print(filename)
Run Code Online (Sandbox Code Playgroud)
Python 3.4+ 的另一个非常易读的变体是使用 pathlib.Path.glob:
from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]
Run Code Online (Sandbox Code Playgroud)
更具体的很简单,例如只在所有子目录中查找不是符号链接的 Python 源文件:
[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3745069 次 |
最近记录: |