Fak*_*ame 74 python filesystems filepath
我有一个路径(包括目录和文件名).
我需要测试文件名是否有效,例如,如果文件系统允许我创建具有这样名称的文件.
文件名中包含一些unicode字符.
可以安全地假设路径的目录段是有效且可访问的(我试图使问题更加适用,显然我太过分了).
除非必须,否则我非常不想逃避任何事情.
我发布了一些我正在处理的示例字符,但显然它们会被堆栈交换系统自动删除.无论如何,我想保留标准的unicode实体ö
,并且只保留文件名中无效的东西.
这是捕获.可能(或可能不)已经是路径目标的文件.我需要保留该文件(如果它存在),如果不存在则不创建文件.
基本上我想检查是否可以写入路径而不实际打开写入路径(以及通常需要的自动文件创建/文件clobbering).
因此:
try:
open(filename, 'w')
except OSError:
# handle error here
Run Code Online (Sandbox Code Playgroud)
是不可接受的,因为它会覆盖我不想触摸的现有文件(如果它在那里),或者如果不存在则创建所述文件.
我知道我能做到:
if not os.access(filePath, os.W_OK):
try:
open(filePath, 'w').close()
os.unlink(filePath)
except OSError:
# handle error here
Run Code Online (Sandbox Code Playgroud)
但那会创建文件filePath
,然后我必须这样做os.unlink
.
最后,似乎花了6或7行来做一些应该简单os.isvalidpath(filePath)
或类似的事情.
顺便说一下,我需要在(至少)Windows和MacOS上运行,所以我想避免特定于平台的东西.
``
Cec*_*rry 111
调用is_path_exists_or_creatable()
下面定义的函数.
严格的Python 3.这就是我们如何滚动.
"我如何测试路径名有效性,以及有效路径名,这些路径的存在性或可写性?"的问题 显然是两个独立的问题.两者都很有趣,而且在这里都没有得到真正令人满意的答案......或者,好吧,我可以在任何地方进行grep.
我们要解决这一切.
Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by "pathname validity." What defines validity, exactly?
By "pathname validity," we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.
By "root filesystem," we mean:
/
).%HOMEDRIVE%
,包含当前的Windows安装(通常但结肠-后缀盘符不必然C:
).反过来,"语法正确性"的含义取决于根文件系统的类型.对于ext4
(和大多数但不是所有POSIX兼容的)文件系统,路径名在语法上是正确的,当且仅当该路径名:
\x00
在Python中).这是所有POSIX兼容文件系统的硬性要求.'a'*256
在Python中).路径成分是含有不路径名的最长子串/
字符(例如,bergtatt
,ind
,i
,和fjeldkamrene
在路径名/bergtatt/ind/i/fjeldkamrene
).句法正确性.根文件系统.而已.
在Python中验证路径名是非常直观的.我在这里与Fake Name达成了一致意见:官方os.path
软件包应为此提供开箱即用的解决方案.对于未知(可能没有说服力)的原因,它没有.幸运的是,展开你自己的临时解决方案并不是那个令人痛苦的......
好的,它实际上是.它有毛; 这很讨厌; 它可能会随着它的发光而变得微不足道.但是你要做什么?Nuthin'.
我们很快就会陷入低级代码的放射性深渊.但首先,让我们谈谈高级商店.传递无效路径名时,标准os.stat()
和os.lstat()
函数会引发以下异常:
FileNotFoundError
.WindowsError
其winerror
属性为123
(即ERROR_INVALID_NAME
)的实例.'\x00'
)的路径名,实例TypeError
.OSError
其errcode
属性为的实例为:
errno.ERANGE
.(这似乎是操作系统级错误,否则称为POSIX标准的"选择性解释".)errno.ENAMETOOLONG
.至关重要的是,这意味着只有驻留在现有目录中的路径名才是可验证的.当传递驻留在不存在的目录中的路径名时,os.stat()
和os.lstat()
函数会引发泛型FileNotFoundError
异常,无论这些路径名是否无效.目录存在优先于路径名无效.
Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn't modifying a pathname prevent us from validating the original pathname?
To answer this question, recall from above that syntactically correct pathnames on the ext4
filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4
pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.
Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:
/troldskog/faren/vild
into the list ['', 'troldskog', 'faren', 'vild']
)./troldskog
) .os.stat()
or os.lstat()
. If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError
exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).
Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat()
or os.lstat()
invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat()
or os.lstat()
. Unleash the dogs of mind-fellating insanity!
There exists a substantial side benefit to the above approach as well: security. (Isn't that nice?) Specifically:
Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to
os.stat()
oros.lstat()
are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.
The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that's stale, slow, or inaccessible, you've got larger problems than pathname validation.)
Lost? Great. Let's begin. (Python 3 assumed. See "What Is Fragile Hope for 300, leycec?")
import errno, os
# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.
See Also
----------
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382%28v=vs.85%29.aspx
Official listing of all such codes.
'''
def is_pathname_valid(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname for the current OS;
`False` otherwise.
'''
# If this pathname is either not a string or is but is empty, this pathname
# is invalid.
try:
if not isinstance(pathname, str) or not pathname:
return False
# Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
# if any. Since Windows prohibits path components from containing `:`
# characters, failing to strip this `:`-suffixed prefix would
# erroneously invalidate all valid absolute Windows pathnames.
_, pathname = os.path.splitdrive(pathname)
# Directory guaranteed to exist. If the current OS is Windows, this is
# the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
# environment variable); else, the typical root directory.
root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
if sys.platform == 'win32' else os.path.sep
assert os.path.isdir(root_dirname) # ...Murphy and her ironclad Law
# Append a path separator to this directory if needed.
root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep
# Test whether each path component split from this pathname is valid or
# not, ignoring non-existent and non-readable path components.
for pathname_part in pathname.split(os.path.sep):
try:
os.lstat(root_dirname + pathname_part)
# If an OS-specific exception is raised, its error code
# indicates whether this pathname is valid or not. Unless this
# is the case, this exception implies an ignorable kernel or
# filesystem complaint (e.g., path not found or inaccessible).
#
# Only the following exceptions indicate invalid pathnames:
#
# * Instances of the Windows-specific "WindowsError" class
# defining the "winerror" attribute whose value is
# "ERROR_INVALID_NAME". Under Windows, "winerror" is more
# fine-grained and hence useful than the generic "errno"
# attribute. When a too-long pathname is passed, for example,
# "errno" is "ENOENT" (i.e., no such file or directory) rather
# than "ENAMETOOLONG" (i.e., file name too long).
# * Instances of the cross-platform "OSError" class defining the
# generic "errno" attribute whose value is either:
# * Under most POSIX-compatible OSes, "ENAMETOOLONG".
# * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
except OSError as exc:
if hasattr(exc, 'winerror'):
if exc.winerror == ERROR_INVALID_NAME:
return False
elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
return False
# If a "TypeError" exception was raised, it almost certainly has the
# error message "embedded NUL character" indicating an invalid pathname.
except TypeError as exc:
return False
# If no exception was raised, all path components and hence this
# pathname itself are valid. (Praise be to the curmudgeonly python.)
else:
return True
# If any other exception was raised, this is an unrelated fatal issue
# (e.g., a bug). Permit this exception to unwind the call stack.
#
# Did we mention this should be shipped with Python already?
Run Code Online (Sandbox Code Playgroud)
Done. Don't squint at that code. (It bites.)
Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:
def is_path_creatable(pathname: str) -> bool:
'''
`True` if the current user has sufficient permissions to create the passed
pathname; `False` otherwise.
'''
# Parent directory of the passed path. If empty, we substitute the current
# working directory (CWD) instead.
dirname = os.path.dirname(pathname) or os.getcwd()
return os.access(dirname, os.W_OK)
def is_path_exists_or_creatable(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname for the current OS _and_
either currently exists or is hypothetically creatable; `False` otherwise.
This function is guaranteed to _never_ raise exceptions.
'''
try:
# To prevent "os" module calls from raising undesirable exceptions on
# invalid pathnames, is_pathname_valid() is explicitly called first.
return is_pathname_valid(pathname) and (
os.path.exists(pathname) or is_path_creatable(pathname))
# Report failure on non-fatal filesystem complaints (e.g., connection
# timeouts, permissions issues) implying this path to be inaccessible. All
# other exceptions are unrelated fatal issues and should not be caught here.
except OSError:
return False
Run Code Online (Sandbox Code Playgroud)
Done and done. Except not quite.
There exists a caveat. Of course there does.
As the official os.access()
documentation admits:
Note: I/O operations may fail even when
os.access()
indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.
To no one's surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn't Python's fault, it might nonetheless be of concern for Windows-compatible applications.
If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:
import os, tempfile
def is_path_sibling_creatable(pathname: str) -> bool:
'''
`True` if the current user has sufficient permissions to create **siblings**
(i.e., arbitrary files in the parent directory) of the passed pathname;
`False` otherwise.
'''
# Parent directory of the passed path. If empty, we substitute the current
# working directory (CWD) instead.
dirname = os.path.dirname(pathname) or os.getcwd()
try:
# For safety, explicitly close and hence delete this temporary file
# immediately after creating it in the passed path's parent directory.
with tempfile.TemporaryFile(dir=dirname): pass
return True
# While the exact type of exception raised by the above function depends on
# the current version of the Python interpreter, all such types subclass the
# following exception superclass.
except EnvironmentError:
return False
def is_path_exists_or_creatable_portable(pathname: str) -> bool:
'''
`True` if the passed pathname is a valid pathname on the current OS _and_
either currently exists or is hypothetically creatable in a cross-platform
manner optimized for POSIX-unfriendly filesystems; `False` otherwise.
This function is guaranteed to _never_ raise exceptions.
'''
try:
# To prevent "os" module calls from raising undesirable exceptions on
# invalid pathnames, is_pathname_valid() is explicitly called first.
return is_pathname_valid(pathname) and (
os.path.exists(pathname) or is_path_sibling_creatable(pathname))
# Report failure on non-fatal filesystem complaints (e.g., connection
# timeouts, permissions issues) implying this path to be inaccessible. All
# other exceptions are unrelated fatal issues and should not be caught here.
except OSError:
return False
Run Code Online (Sandbox Code Playgroud)
Note, however, that even this may not be enough.
Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:\Windows
or C:\Windows\system32
directories, UAC superficially permits the user to do so while actually isolating all created files into a "Virtual Store" in that user's profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)
This is crazy. This is Windows.
Dare we? It's time to test-drive the above tests.
Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let's leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:
>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False
Run Code Online (Sandbox Code Playgroud)
Beyond sanity. Beyond pain. You will find Python portability concerns.
Nob*_*ody 37
if os.path.exists(filePath):
#the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
#the file does not exists but write privileges are given
else:
#can not write there
Run Code Online (Sandbox Code Playgroud)
请注意,path.exists
由于更多原因the file is not there
可能会失败,因此您可能必须执行更精细的测试,例如测试包含目录是否存在等等.
结果我与OP讨论后,主要问题似乎是,文件名可能包含文件系统不允许的字符.当然,他们需要被移除,但OP希望保持尽可能多的文件系统允许的人类可读性.
可悲的是,我不知道有什么好的解决办法.然而,塞西尔库里的答案仔细研究了发现问题.
使用Python 3,如何:
try:
with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
pass
except OSError:
# handle error here
Run Code Online (Sandbox Code Playgroud)
使用'x'选项,我们也不必担心竞争条件.见文档这里.
现在,如果它不存在,这将创建一个非常短的临时文件 - 除非名称无效.如果你能忍受它,它会简化很多事情.
找到一个名为 pathvalidate 的 PyPI 模块
https://pypi.org/project/pathvalidate/
pip 安装路径验证
它内部有一个名为 sanitize 的函数,它将获取文件路径并将其转换为有效的 filePath
from pathvalidate import sanitize_filepath
file1 = “ap:lle/fi:le”
print(sanitize_filepath(file1))
#will return apple/file
Run Code Online (Sandbox Code Playgroud)
它也适用于保留名称。如果你给它 filePath con,它会返回 con_
因此,有了这些知识,我们就可以检查输入的文件路径是否等于经过消毒的文件路径,这意味着文件路径是有效的
import os
from pathvalidate import sanitize_filepath
def check(filePath):
if os.path.exisits(filePath):
return True
if filePath == sanitize_filepath(filePath):
return True
return False
Run Code Online (Sandbox Code Playgroud)
open(filename,'r') #2nd argument is r and not w
Run Code Online (Sandbox Code Playgroud)
如果文件不存在,将打开文件或给出错误。如果有错误,那么你可以尝试写入路径,如果你不能,那么你会得到第二个错误
try:
open(filename,'r')
return True
except IOError:
try:
open(filename, 'w')
return True
except IOError:
return False
Run Code Online (Sandbox Code Playgroud)
还可以在这里查看有关 Windows 权限的信息