Python:在 Ubuntu 上覆盖 os.path.supports_unicode_filenames

3 python filesystems unicode ubuntu utf-8

我在 Ubuntu 服务器上运行 python Web 应用程序,同时在 OS X 上进行本地开发。

我对希伯来语使用了很多 unicode 字符串,包括操作图像的文件名,因此它们将使用希伯来语字符保存在文件系统上。

我的 Ubuntu 服务器已完全配置为 UTF-8 - 我的文件系统(在此应用程序之外)上还有其他带有希伯来语名称的图像、希伯来语命名目录等。

但是,当我尝试在 Ubuntu 上(但不是在 OS X 上)保存具有希伯来语文件名的图像时,我的应用程序会返回错误。

错误是:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)

经过大量调查后,据我所知,我找到了最后一个可能的原因:

# Inside my virtualenv, Mac OS X
>>> import os.path
>>> os.path.supports_unicode_filenames
>>> True

# Inside my virtualenv, Ubuntu 12.04
>>> import os.path
>>> os.path.supports_unicode_filenames
>>> False
Run Code Online (Sandbox Code Playgroud)

为了好奇,这里是我的 Ubuntu 区域设置:

locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Run Code Online (Sandbox Code Playgroud)

更新:添加代码和示例字符串:

# a string, of the type I would get for instance.product.name, as used below.
u'\\u05e7\\u05e8\\u05d5\\u05d1-\\u05e8\\u05d7\\u05d5\\u05e7'


#utils.py
# I get an image object from django, and I run this function so django 
# can use the generated filepath for the image.
def get_upload_path(instance, filename):

    tmp = filename.split('.')
    extension = '.' + tmp[-1]

    if instance.__class__.__name__ == 'MyClass':

        seo_filename = unislugify(instance.product.name)
        # unislugify takes a string and strips spaces, etc.
        value = IMAGES_PRODUCT_DIR + seo_filename + extension

    else:

        value = IMAGES_GENERAL_DIR + unislugify(filename)

    return value
Run Code Online (Sandbox Code Playgroud)

示例堆栈跟踪:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-66: ordinal not in range(128)

Stacktrace (most recent call last):

  File "django/core/handlers/base.py", line 111, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "django/contrib/admin/options.py", line 366, in wrapper
    return self.admin_site.admin_view(view)(*args, **kwargs)

  File "django/utils/decorators.py", line 91, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "django/views/decorators/cache.py", line 89, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)

  File "django/contrib/admin/sites.py", line 196, in inner
    return view(request, *args, **kwargs)

  File "django/utils/decorators.py", line 25, in _wrapper
    return bound_func(*args, **kwargs)

  File "django/utils/decorators.py", line 91, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "django/utils/decorators.py", line 21, in bound_func
    return func(self, *args2, **kwargs2)

  File "django/db/transaction.py", line 209, in inner
    return func(*args, **kwargs)

  File "django/contrib/admin/options.py", line 1055, in change_view
    self.save_related(request, form, formsets, True)

  File "django/contrib/admin/options.py", line 733, in save_related
    self.save_formset(request, form, formset, change=change)

  File "django/contrib/admin/options.py", line 721, in save_formset
    formset.save()

  File "django/forms/models.py", line 497, in save
    return self.save_existing_objects(commit) + self.save_new_objects(commit)

  File "django/forms/models.py", line 628, in save_new_objects
    self.new_objects.append(self.save_new(form, commit=commit))

  File "django/forms/models.py", line 731, in save_new
    obj.save()

  File "django/db/models/base.py", line 463, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)

  File "django/db/models/base.py", line 551, in save_base
    result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)

  File "django/db/models/manager.py", line 203, in _insert
    return insert_query(self.model, objs, fields, **kwargs)

  File "django/db/models/query.py", line 1593, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)

  File "django/db/models/sql/compiler.py", line 909, in execute_sql
    for sql, params in self.as_sql():

  File "django/db/models/sql/compiler.py", line 872, in as_sql
    for obj in self.query.objs

  File "django/db/models/fields/files.py", line 249, in pre_save
    file.save(file.name, file, save=False)

  File "django/db/models/fields/files.py", line 86, in save
    self.name = self.storage.save(name, content)

  File "django/core/files/storage.py", line 44, in save
    name = self.get_available_name(name)

  File "django/core/files/storage.py", line 70, in get_available_name
    while self.exists(name):

  File "django/core/files/storage.py", line 230, in exists
    return os.path.exists(self.path(name))

  File "python2.7/genericpath.py", line 18, in exists
    os.stat(path)
Run Code Online (Sandbox Code Playgroud)

mat*_*ata 5

os.path.supports_unicode_filenames在除 darwin 之外的 posix 系统上始终为False,那是因为它们并不真正关心文件名的编码,它只是一个字节序列。区域设置指定如何解释这些字节,这就是为什么当区域设置不正确时,您可能会在终端中出现损坏的字符。

您的网络应用程序运行得如何?ascii如果您使用 cgi 或 wsgi 通过 Web 服务器(apache?)运行它,则区域设置可能不是您在 shell 中看到的,因此这可能是 python 尝试使用编解码器对路径名进行编码的原因。

为了使其工作,您可以utf-8在打开文件时手动对路径名进行编码。

编辑:
所以失败的是对 的调用os.stat,我们用 unicode 字符串调用它,尝试根据默认编码( )将其转换为字节字符串,在 uWSGI 环境中使用 python2 时sys.getdefaultencoding()似乎总是如此。ascii要解决此问题,您可以确保将任何 unicode 字符串编码为 utf-8,然后才能将其传递到 os.stat。