使用正则表达式逗号分隔南亚编号系统中的大量数字

new*_*bie 7 python regex comma

我试图找到一个正则表达式,以逗号分隔一个基于南亚编号系统的大数字.

几个例子:

  • 1,000,000(阿拉伯语)是10,00,000(印度/印度教/南亚)
  • 1,000,000,000(阿拉伯语)是100,00,00,000(印度/ H/SA).

每7个数字重复逗号模式.例如, 1,00,00,000,00,00,000.

从Friedl的Mastering Regular Expressions这本书中,我有以下阿拉伯语编号系统的正则表达式:

r'(?<=\d)(?=(\d{3})+(?!\d))'
Run Code Online (Sandbox Code Playgroud)

对于印度编号系统,我提出了以下表达式,但它不适用于超过8位的数字:

r'(?<=\d)(?=(((\d{2}){0,2}\d{3})(?=\b)))'
Run Code Online (Sandbox Code Playgroud)

使用上面的模式,我明白了100000000,00,00,000.

我正在使用Python re模块(re.sub()).有任何想法吗?

Dun*_*can 7

我知道蒂姆回答了你问过的问题,但假设你从数字而不是字符串开始,你是否考虑过你是否需要正则表达式?如果您使用的计算机支持印度语区域设置,那么您可以使用语言环境模块:

>>> import locale
>>> locale.setlocale(locale.LC_NUMERIC, "en_IN")
'en_IN'
>>> locale.format("%d", 10000000, grouping=True)
'1,00,00,000'
Run Code Online (Sandbox Code Playgroud)

该解释器会话是从Ubuntu系统复制的,但要注意Windows系统可能不支持合适的区域设置(至少我的不支持),所以虽然这在某种程度上是一个"更清洁"的解决方案,但根据您的环境,它可能或者可能无法使用.


Tim*_*ker 6

试试这个:

(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))
Run Code Online (Sandbox Code Playgroud)

例如:

>>> import re
>>> inp = ["1" + "0"*i for i in range(20)]
>>> [re.sub(r"(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))", ",", i) 
     for i in inp]
['1', '10', '100', '1,000', '10,000', '1,00,000', '10,00,000', '1,00,00,000', 
 '10,00,00,000', '100,00,00,000', '1,000,00,00,000', '10,000,00,00,000', 
 '1,00,000,00,00,000', '10,00,000,00,00,000', '1,00,00,000,00,00,000', 
 '10,00,00,000,00,00,000', '100,00,00,000,00,00,000', 
 '1,000,00,00,000,00,00,000', '10,000,00,00,000,00,00,000',
 '1,00,000,00,00,000,00,00,000']
Run Code Online (Sandbox Code Playgroud)

作为评论的正则表达式:

result = re.sub(
    r"""(?x)       # Enable verbose mode (comments)
    (?<=\d)        # Assert that we're not at the start of the number.
    (?=            # Assert that it's possible to match:
     (\d{2}){0,2}  # 0, 2 or 4 digits,
     \d{3}         # followed by 3 digits,
     (\d{7})*      # followed by 0, 7, 14, 21 ... digits,
     (?!\d)        # and no more digits after that.
    )              # End of lookahead assertion.""", 
    ",", subject)
Run Code Online (Sandbox Code Playgroud)