检查路径在Python中是否有效,而不在路径的目标上创建文件

Fak*_*ame 74 python filesystems filepath

我有一个路径(包括目录和文件名).
我需要测试文件名是否有效,例如,如果文件系统允许我创建具有这样名称的文件.
文件名中包含一些unicode字符.

可以安全地假设路径的目录段是有效且可访问的(我试图使问题更加适用,显然我太过分了).

除非必须,否则我非常不想逃避任何事情.

我发布了一些我正在处理的示例字符,但显然它们会被堆栈交换系统自动删除.无论如何,我想保留标准的unicode实体ö,并且只保留文件名中无效的东西.


这是捕获.可能(或可能不)已经是路径目标的文件.我需要保留该文件(如果它存在),如果不存在则不创建文件.

基本上我想检查是否可以写入路径而不实际打开写入路径(以及通常需要的自动文件创建/文件clobbering).

因此:

try:
    open(filename, 'w')
except OSError:
    # handle error here
Run Code Online (Sandbox Code Playgroud)

从这里

是不可接受的,因为它会覆盖我不想触摸的现有文件(如果它在那里),或者如果不存在则创建所述文件.

我知道我能做到:

if not os.access(filePath, os.W_OK):
    try:
        open(filePath, 'w').close()
        os.unlink(filePath)
    except OSError:
        # handle error here
Run Code Online (Sandbox Code Playgroud)

但那会创建文件filePath,然后我必须这样做os.unlink.

最后,似乎花了6或7行来做一些应该简单os.isvalidpath(filePath)或类似的事情.


顺便说一下,我需要在(至少)Windows和MacOS上运行,所以我想避免特定于平台的东西.

``

Cec*_*rry 111

TL;博士

调用is_path_exists_or_creatable()下面定义的函数.

严格的Python 3.这就是我们如何滚动.

一个两个问题的故事

"我如何测试路径名有效性,以及有效路径名,这些路径的存在性或可写性?"的问题 显然是两个独立的问题.两者都很有趣,而且在这里都没有得到真正令人满意的答案......或者,好吧,我可以在任何地方进行grep.

vikki回答可能是最接近的,但却有以下显着的缺点:

  • 不必要地打开(......然后无法可靠地关闭)文件句柄.
  • 无需写入(...然后无法可靠关闭或删除)0字节文件.
  • 忽略特定于操作系统的错误,区分不可忽略的无效路径名和可忽略的文件系统问题.不出所料,这在Windows下至关重要.(见下文.)
  • 忽略由外部进程同时(重新)移动要测试的路径名的父目录导致的竞争条件.(见下文.)
  • 忽略由此路径名导致的连接超时,这些路径名驻留在过时,慢速或其他临时无法访问的文件系统上.这可能会使面向公众的服务面临潜在的DoS驱动攻击.(见下文.)

我们要解决这一切.

问题#0:什么是路径名有效性?

Before hurling our fragile meat suits into the python-riddled moshpits of pain, we should probably define what we mean by "pathname validity." What defines validity, exactly?

By "pathname validity," we mean the syntactic correctness of a pathname with respect to the root filesystem of the current system – regardless of whether that path or parent directories thereof physically exist. A pathname is syntactically correct under this definition if it complies with all syntactic requirements of the root filesystem.

By "root filesystem," we mean:

  • On POSIX-compatible systems, the filesystem mounted to the root directory (/).
  • 在Windows中,文件系统安装到%HOMEDRIVE%,包含当前的Windows安装(通常但结肠-后缀盘符必然C:).

反过来,"语法正确性"的含义取决于根文件系统的类型.对于ext4(和大多数但不是所有POSIX兼容的)文件系统,路径名在语法上是正确的,当且仅当该路径名:

  • 不包含空字节(即\x00在Python中).这是所有POSIX兼容文件系统的硬性要求.
  • 不包含长度超过255个字节的路径组件(例如,'a'*256在Python中).路径成分是含有不路径名的最长子串/字符(例如,bergtatt,ind,i,和fjeldkamrene在路径名/bergtatt/ind/i/fjeldkamrene).

句法正确性.根文件系统.而已.

问题1:我们现在应该如何做路径名有效性?

在Python中验证路径名是非常直观的.我在这里与Fake Name达成了一致意见:官方os.path软件包应为此提供开箱即用的解决方案.对于未知(可能没有说服力)的原因,它没有.幸运的是,展开你自己的临时解决方案并不是那个令人痛苦的......

好的,它实际上是.它有毛; 这很讨厌; 它可能会随着它的发光而变得微不足道.但是你要做什么?Nuthin'.

我们很快就会陷入低级代码的放射性深渊.但首先,让我们谈谈高级商店.传递无效路径名时,标准os.stat()os.lstat()函数会引发以下异常:

  • 对于驻留在不存在的目录中的路径名,实例FileNotFoundError.
  • 对于驻留在现有目录中的路径名:
    • 在Windows下,WindowsErrorwinerror属性为123(即ERROR_INVALID_NAME)的实例.
    • 在所有其他操作系统下:
    • 对于包含空字节(即'\x00')的路径名,实例TypeError.
    • 对于包含长度超过255个字节的路径组件的路径名,OSErrorerrcode属性为的实例为:
      • 根据SunOS和*BSD系列操作系统,errno.ERANGE.(这似乎是操作系统级错误,否则称为POSIX标准的"选择性解释".)
      • 在所有其他操作系统下,errno.ENAMETOOLONG.

至关重要的是,这意味着只有驻留在现有目录中的路径名才是可验证的.当传递驻留在不存在的目录中的路径名时,os.stat()os.lstat()函数会引发泛型FileNotFoundError异常,无论这些路径名是否无效.目录存在优先于路径名无效.

Does this mean that pathnames residing in non-existing directories are not validatable? Yes – unless we modify those pathnames to reside in existing directories. Is that even safely feasible, however? Shouldn't modifying a pathname prevent us from validating the original pathname?

To answer this question, recall from above that syntactically correct pathnames on the ext4 filesystem contain no path components (A) containing null bytes or (B) over 255 bytes in length. Hence, an ext4 pathname is valid if and only if all path components in that pathname are valid. This is true of most real-world filesystems of interest.

Does that pedantic insight actually help us? Yes. It reduces the larger problem of validating the full pathname in one fell swoop to the smaller problem of only validating all path components in that pathname. Any arbitrary pathname is validatable (regardless of whether that pathname resides in an existing directory or not) in a cross-platform manner by following the following algorithm:

  1. Split that pathname into path components (e.g., the pathname /troldskog/faren/vild into the list ['', 'troldskog', 'faren', 'vild']).
  2. For each such component:
    1. Join the pathname of a directory guaranteed to exist with that component into a new temporary pathname (e.g., /troldskog) .
    2. Pass that pathname to os.stat() or os.lstat(). If that pathname and hence that component is invalid, this call is guaranteed to raise an exception exposing the type of invalidity rather than a generic FileNotFoundError exception. Why? Because that pathname resides in an existing directory. (Circular logic is circular.)

Is there a directory guaranteed to exist? Yes, but typically only one: the topmost directory of the root filesystem (as defined above).

Passing pathnames residing in any other directory (and hence not guaranteed to exist) to os.stat() or os.lstat() invites race conditions, even if that directory was previously tested to exist. Why? Because external processes cannot be prevented from concurrently removing that directory after that test has been performed but before that pathname is passed to os.stat() or os.lstat(). Unleash the dogs of mind-fellating insanity!

There exists a substantial side benefit to the above approach as well: security. (Isn't that nice?) Specifically:

Front-facing applications validating arbitrary pathnames from untrusted sources by simply passing such pathnames to os.stat() or os.lstat() are susceptible to Denial of Service (DoS) attacks and other black-hat shenanigans. Malicious users may attempt to repeatedly validate pathnames residing on filesystems known to be stale or otherwise slow (e.g., NFS Samba shares); in that case, blindly statting incoming pathnames is liable to either eventually fail with connection timeouts or consume more time and resources than your feeble capacity to withstand unemployment.

The above approach obviates this by only validating the path components of a pathname against the root directory of the root filesystem. (If even that's stale, slow, or inaccessible, you've got larger problems than pathname validation.)

Lost? Great. Let's begin. (Python 3 assumed. See "What Is Fragile Hope for 300, leycec?")

import errno, os

# Sadly, Python fails to provide the following magic number for us.
ERROR_INVALID_NAME = 123
'''
Windows-specific error code indicating an invalid pathname.

See Also
----------
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382%28v=vs.85%29.aspx
    Official listing of all such codes.
'''

def is_pathname_valid(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS;
    `False` otherwise.
    '''
    # If this pathname is either not a string or is but is empty, this pathname
    # is invalid.
    try:
        if not isinstance(pathname, str) or not pathname:
            return False

        # Strip this pathname's Windows-specific drive specifier (e.g., `C:\`)
        # if any. Since Windows prohibits path components from containing `:`
        # characters, failing to strip this `:`-suffixed prefix would
        # erroneously invalidate all valid absolute Windows pathnames.
        _, pathname = os.path.splitdrive(pathname)

        # Directory guaranteed to exist. If the current OS is Windows, this is
        # the drive to which Windows was installed (e.g., the "%HOMEDRIVE%"
        # environment variable); else, the typical root directory.
        root_dirname = os.environ.get('HOMEDRIVE', 'C:') \
            if sys.platform == 'win32' else os.path.sep
        assert os.path.isdir(root_dirname)   # ...Murphy and her ironclad Law

        # Append a path separator to this directory if needed.
        root_dirname = root_dirname.rstrip(os.path.sep) + os.path.sep

        # Test whether each path component split from this pathname is valid or
        # not, ignoring non-existent and non-readable path components.
        for pathname_part in pathname.split(os.path.sep):
            try:
                os.lstat(root_dirname + pathname_part)
            # If an OS-specific exception is raised, its error code
            # indicates whether this pathname is valid or not. Unless this
            # is the case, this exception implies an ignorable kernel or
            # filesystem complaint (e.g., path not found or inaccessible).
            #
            # Only the following exceptions indicate invalid pathnames:
            #
            # * Instances of the Windows-specific "WindowsError" class
            #   defining the "winerror" attribute whose value is
            #   "ERROR_INVALID_NAME". Under Windows, "winerror" is more
            #   fine-grained and hence useful than the generic "errno"
            #   attribute. When a too-long pathname is passed, for example,
            #   "errno" is "ENOENT" (i.e., no such file or directory) rather
            #   than "ENAMETOOLONG" (i.e., file name too long).
            # * Instances of the cross-platform "OSError" class defining the
            #   generic "errno" attribute whose value is either:
            #   * Under most POSIX-compatible OSes, "ENAMETOOLONG".
            #   * Under some edge-case OSes (e.g., SunOS, *BSD), "ERANGE".
            except OSError as exc:
                if hasattr(exc, 'winerror'):
                    if exc.winerror == ERROR_INVALID_NAME:
                        return False
                elif exc.errno in {errno.ENAMETOOLONG, errno.ERANGE}:
                    return False
    # If a "TypeError" exception was raised, it almost certainly has the
    # error message "embedded NUL character" indicating an invalid pathname.
    except TypeError as exc:
        return False
    # If no exception was raised, all path components and hence this
    # pathname itself are valid. (Praise be to the curmudgeonly python.)
    else:
        return True
    # If any other exception was raised, this is an unrelated fatal issue
    # (e.g., a bug). Permit this exception to unwind the call stack.
    #
    # Did we mention this should be shipped with Python already?
Run Code Online (Sandbox Code Playgroud)

Done. Don't squint at that code. (It bites.)

Question #2: Possibly Invalid Pathname Existence or Creatability, Eh?

Testing the existence or creatability of possibly invalid pathnames is, given the above solution, mostly trivial. The little key here is to call the previously defined function before testing the passed path:

def is_path_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create the passed
    pathname; `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()
    return os.access(dirname, os.W_OK)

def is_path_exists_or_creatable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname for the current OS _and_
    either currently exists or is hypothetically creatable; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False
Run Code Online (Sandbox Code Playgroud)

Done and done. Except not quite.

Question #3: Possibly Invalid Pathname Existence or Writability on Windows

There exists a caveat. Of course there does.

As the official os.access() documentation admits:

Note: I/O operations may fail even when os.access() indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.

To no one's surprise, Windows is the usual suspect here. Thanks to extensive use of Access Control Lists (ACL) on NTFS filesystems, the simplistic POSIX permission-bit model maps poorly to the underlying Windows reality. While this (arguably) isn't Python's fault, it might nonetheless be of concern for Windows-compatible applications.

If this is you, a more robust alternative is wanted. If the passed path does not exist, we instead attempt to create a temporary file guaranteed to be immediately deleted in the parent directory of that path – a more portable (if expensive) test of creatability:

import os, tempfile

def is_path_sibling_creatable(pathname: str) -> bool:
    '''
    `True` if the current user has sufficient permissions to create **siblings**
    (i.e., arbitrary files in the parent directory) of the passed pathname;
    `False` otherwise.
    '''
    # Parent directory of the passed path. If empty, we substitute the current
    # working directory (CWD) instead.
    dirname = os.path.dirname(pathname) or os.getcwd()

    try:
        # For safety, explicitly close and hence delete this temporary file
        # immediately after creating it in the passed path's parent directory.
        with tempfile.TemporaryFile(dir=dirname): pass
        return True
    # While the exact type of exception raised by the above function depends on
    # the current version of the Python interpreter, all such types subclass the
    # following exception superclass.
    except EnvironmentError:
        return False

def is_path_exists_or_creatable_portable(pathname: str) -> bool:
    '''
    `True` if the passed pathname is a valid pathname on the current OS _and_
    either currently exists or is hypothetically creatable in a cross-platform
    manner optimized for POSIX-unfriendly filesystems; `False` otherwise.

    This function is guaranteed to _never_ raise exceptions.
    '''
    try:
        # To prevent "os" module calls from raising undesirable exceptions on
        # invalid pathnames, is_pathname_valid() is explicitly called first.
        return is_pathname_valid(pathname) and (
            os.path.exists(pathname) or is_path_sibling_creatable(pathname))
    # Report failure on non-fatal filesystem complaints (e.g., connection
    # timeouts, permissions issues) implying this path to be inaccessible. All
    # other exceptions are unrelated fatal issues and should not be caught here.
    except OSError:
        return False
Run Code Online (Sandbox Code Playgroud)

Note, however, that even this may not be enough.

Thanks to User Access Control (UAC), the ever-inimicable Windows Vista and all subsequent iterations thereof blatantly lie about permissions pertaining to system directories. When non-Administrator users attempt to create files in either the canonical C:\Windows or C:\Windows\system32 directories, UAC superficially permits the user to do so while actually isolating all created files into a "Virtual Store" in that user's profile. (Who could have possibly imagined that deceiving users would have harmful long-term consequences?)

This is crazy. This is Windows.

Prove It

Dare we? It's time to test-drive the above tests.

Since NULL is the only character prohibited in pathnames on UNIX-oriented filesystems, let's leverage that to demonstrate the cold, hard truth – ignoring non-ignorable Windows shenanigans, which frankly bore and anger me in equal measure:

>>> print('"foo.bar" valid? ' + str(is_pathname_valid('foo.bar')))
"foo.bar" valid? True
>>> print('Null byte valid? ' + str(is_pathname_valid('\x00')))
Null byte valid? False
>>> print('Long path valid? ' + str(is_pathname_valid('a' * 256)))
Long path valid? False
>>> print('"/dev" exists or creatable? ' + str(is_path_exists_or_creatable('/dev')))
"/dev" exists or creatable? True
>>> print('"/dev/foo.bar" exists or creatable? ' + str(is_path_exists_or_creatable('/dev/foo.bar')))
"/dev/foo.bar" exists or creatable? False
>>> print('Null byte exists or creatable? ' + str(is_path_exists_or_creatable('\x00')))
Null byte exists or creatable? False
Run Code Online (Sandbox Code Playgroud)

Beyond sanity. Beyond pain. You will find Python portability concerns.

  • **是的,是我!** 试图将一个跨可移植的路径名验证正则表达式拼凑在一起是徒劳的,并且对于常见的边缘情况肯定会失败。考虑 Windows 上的路径名长度,例如:“32,767 个字符的最大路径是近似值,因为 '\\?\' 前缀可能会在运行时被系统扩展为更长的字符串,而这种扩展适用于总长度.” 鉴于此,构建仅匹配有效路径名的正则表达式实际上是**技术上不可行的**。只是遵循 Python 更合理。 (3认同)
  • **啊。**我(不情愿地)明白了。你正在做一些比破解正则表达式更奇怪的事情。是的,[那个](/sf/answers/1949653261/) 肯定会失败得更厉害。这也完全无法解决有问题的问题,即_不_“如何从特定于 Windows 的基本名称中删除无效子字符串?” (...由于您自己的遗漏,您无法解决 - 再次由于边缘情况)但是“我如何交叉可移植地测试路径名有效性,以及对于有效路径名,这些路径的存在或可写性?” (2认同)
  • 至于命名法,我很喜欢用`is_`作为测试者名称的前缀。这是我的性格缺陷。尽管如此,适当地指出:你不能取悦所有人,有时你也不能取悦任何人。;) (2认同)

Nob*_*ody 37

if os.path.exists(filePath):
    #the file is there
elif os.access(os.path.dirname(filePath), os.W_OK):
    #the file does not exists but write privileges are given
else:
    #can not write there
Run Code Online (Sandbox Code Playgroud)

请注意,path.exists由于更多原因the file is not there可能会失败,因此您可能必须执行更精细的测试,例如测试包含目录是否存在等等.


结果我与OP讨论后,主要问题似乎是,文件名可能包含文件系统不允许的字符.当然,他们需要被移除,但OP希望保持尽可能多的文件系统允许的人类可读性.

可悲的是,我不知道有什么好的解决办法.然而,塞西尔库里的答案仔细研究了发现问题.

  • **我不知道为什么这个答案被赞成。** 它并没有_远程_毗邻解决核心问题 - 简而言之,就是:“请验证路径名?” 验证路径权限是这里的一个辅助(并且基本上可以忽略)问题。虽然从技术上讲,对 os.path.exists(filePath) 的调用确实会引发无效路径名的异常,但需要显式捕获这些异常并将其与其他不相关的异常区分开来。此外,同一调用在当前用户_不_具有读取权限的现有路径上返回“False”。简而言之,就是坏事。 (2认同)
  • @CecilCurry:回答您的问题:查看问题的编辑历史记录。与大多数问题一样,一开始并没有那么明确,即使现在,仅标题的措辞也可能与您所说的不同。 (2认同)

Ste*_*ler 8

使用Python 3,如何:

try:
    with open(filename, 'x') as tempfile: # OSError if file exists or is invalid
        pass
except OSError:
    # handle error here
Run Code Online (Sandbox Code Playgroud)

使用'x'选项,我们也不必担心竞争条件.见文档这里.

现在,如果它不存在,这将创建一个非常短的临时文件 - 除非名称无效.如果你能忍受它,它会简化很多事情.

  • 在这一点上,需要这个的项目已经远远超出了答案甚至相关的地步,以至于我无法真正接受答案。 (2认同)

Zac*_*ary 7

找到一个名为 pathvalidate 的 PyPI 模块

https://pypi.org/project/pathvalidate/

pip 安装路径验证

它内部有一个名为 sanitize 的函数,它将获取文件路径并将其转换为有效的 filePath

from pathvalidate import sanitize_filepath
file1 = “ap:lle/fi:le”
print(sanitize_filepath(file1))
#will return apple/file
Run Code Online (Sandbox Code Playgroud)

它也适用于保留名称。如果你给它 filePath con,它会返回 con_

因此,有了这些知识,我们就可以检查输入的文件路径是否等于经过消毒的文件路径,这意味着文件路径是有效的

import os
from pathvalidate import sanitize_filepath

def check(filePath):
    if os.path.exisits(filePath):
        return True
    if filePath == sanitize_filepath(filePath):
        return True
    return False
Run Code Online (Sandbox Code Playgroud)


vik*_*kki 5

open(filename,'r')   #2nd argument is r and not w
Run Code Online (Sandbox Code Playgroud)

如果文件不存在,将打开文件或给出错误。如果有错误,那么你可以尝试写入路径,如果你不能,那么你会得到第二个错误

try:
    open(filename,'r')
    return True
except IOError:
    try:
        open(filename, 'w')
        return True
    except IOError:
        return False
Run Code Online (Sandbox Code Playgroud)

还可以在这里查看有关 Windows 权限的信息


Nil*_*esh -9

尝试os.path.exists这将检查路径并返回True是否存在或False不存在。

  • 实际上,特定于文件系统。 (2认同)