我正在搜索所有.csv位于子文件夹中,glob
如下所示:
def scan_for_files(path):
file_list = []
for path, dirs, files in os.walk(path):
for d in dirs:
for f in glob.iglob(os.path.join(path, d, '*.csv')):
file_list.append(f)
return file_list
Run Code Online (Sandbox Code Playgroud)
如果我打电话:
path = r'/data/realtimedata/trades/bitfinex/'
scan_for_files(path)
我得到了正确的递归文件列表:
['/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_13.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/btcusd/bitfinex_btcusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_14.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_12.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_10.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_08.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_09.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_15.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_11.csv',
'/data/realtimedata/trades/bitfinex/ethusd/bitfinex_ethusd_trades_2018_05_13.csv']
Run Code Online (Sandbox Code Playgroud)
但是当使用包含我想要的文件的实际子目录时 - 它返回一个空列表.知道为什么会这样吗?谢谢.
path = r'/data/realtimedata/trades/bitfinex/btcusd/'
scan_for_files(path)
收益: []
看起来像是btcusd
一个底层目录.这意味着,当你调用os.walk
与r'/data/realtimedata/trades/bitfinex/btcusd/'
路径,该dirs
变量将是一个空列表[]
,所以内循环for d in dirs:
不执行的.
我的建议是重新编写你的函数来直接迭代文件,而不是目录...不要担心,你最终会到达那里,这就是目录树的本质.
def scan_for_files(path):
file_list = []
for path, _, files in os.walk(path):
for f in files:
file_list.extend(glob.iglob(os.path.join(path, f, '*.csv'))
return file_list
Run Code Online (Sandbox Code Playgroud)
但是,在更新版本的python(3.5+)中,您可以使用递归glob:
def scan_for_files(path):
return glob.glob(os.path.join(path, '**', '*.csv'), recursive=True)
Run Code Online (Sandbox Code Playgroud)
来源.