相关疑难解决方法(0)

正确处理土耳其大写和小写,需要修改/覆盖内置函数吗？

我正在使用多语言文本数据,其中包括使用西里尔字母和土耳其语的俄语.我基本上要的话在比较两个文件my_file和check_file,如果在的话my_file可以发现check_file,把它们写在输出文件中保留约从两个输入文件这些词的元信息.

有些单词是小写的,而其他单词是大写的,所以我必须小写所有单词来比较它们.当我使用Python 3.6.5并且Python 3使用unicode作为默认值时,它会处理小写,然后为Cyrillic正确地大写单词.但是对于土耳其语,有些字母处理不正确.大写'?'应对应小写'i',大写'I'应对应小写'?',小写'i'应对应大写'?',如果我在控制台中键入以下内容则不是这种情况:

>>> print('?'.lower())
i?  # somewhat not rendered correctly, corresponds to unicode 'i\u0307'
>>> print('I'.lower())
i
>>> print('i'.upper())
I

Run Code Online (Sandbox Code Playgroud)

我正在做如下(简化的示例代码):

# python my_file check_file language

import sys

language = sys.argv[3]

# code to get the files as lists

my_file_list = [['?spanak', 'N'], ['?s?r', 'N'], ['ac?k', 'V']]
check_file_list = [['109', 'Ispanak', 'food_drink'], ['470', 'Is?r', 'action_words'], [409, 'Ac?k', 'action_words']] …

Run Code Online (Sandbox Code Playgroud)

python turkish built-in python-3.x cyrillic

Fab*_*ble

lucky-day

11
推荐指数

1
解决办法

413
查看次数