new*_*bie 7 python regex comma
我试图找到一个正则表达式,以逗号分隔一个基于南亚编号系统的大数字.
几个例子:
1,000,000(阿拉伯语)是10,00,000(印度/印度教/南亚)1,000,000,000(阿拉伯语)是100,00,00,000(印度/ H/SA).每7个数字重复逗号模式.例如,
1,00,00,000,00,00,000.
从Friedl的Mastering Regular Expressions这本书中,我有以下阿拉伯语编号系统的正则表达式:
r'(?<=\d)(?=(\d{3})+(?!\d))'
Run Code Online (Sandbox Code Playgroud)
对于印度编号系统,我提出了以下表达式,但它不适用于超过8位的数字:
r'(?<=\d)(?=(((\d{2}){0,2}\d{3})(?=\b)))'
Run Code Online (Sandbox Code Playgroud)
使用上面的模式,我明白了100000000,00,00,000.
我正在使用Python re模块(re.sub()).有任何想法吗?
我知道蒂姆回答了你问过的问题,但假设你从数字而不是字符串开始,你是否考虑过你是否需要正则表达式?如果您使用的计算机支持印度语区域设置,那么您可以使用语言环境模块:
>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, "en_IN")
'en_IN'
>>> locale.format("%d", 10000000, grouping=True)
'1,00,00,000'
Run Code Online (Sandbox Code Playgroud)
该解释器会话是从Ubuntu系统复制的,但要注意Windows系统可能不支持合适的区域设置(至少我的不支持),所以虽然这在某种程度上是一个"更清洁"的解决方案,但根据您的环境,它可能或者可能无法使用.
试试这个:
(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))
Run Code Online (Sandbox Code Playgroud)
例如:
>>> import re
>>> inp = ["1" + "0"*i for i in range(20)]
>>> [re.sub(r"(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))", ",", i)
for i in inp]
['1', '10', '100', '1,000', '10,000', '1,00,000', '10,00,000', '1,00,00,000',
'10,00,00,000', '100,00,00,000', '1,000,00,00,000', '10,000,00,00,000',
'1,00,000,00,00,000', '10,00,000,00,00,000', '1,00,00,000,00,00,000',
'10,00,00,000,00,00,000', '100,00,00,000,00,00,000',
'1,000,00,00,000,00,00,000', '10,000,00,00,000,00,00,000',
'1,00,000,00,00,000,00,00,000']
Run Code Online (Sandbox Code Playgroud)
作为评论的正则表达式:
result = re.sub(
r"""(?x) # Enable verbose mode (comments)
(?<=\d) # Assert that we're not at the start of the number.
(?= # Assert that it's possible to match:
(\d{2}){0,2} # 0, 2 or 4 digits,
\d{3} # followed by 3 digits,
(\d{7})* # followed by 0, 7, 14, 21 ... digits,
(?!\d) # and no more digits after that.
) # End of lookahead assertion.""",
",", subject)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
801 次 |
| 最近记录: |