Jav*_* C. 31 python localization
我有一些字符串表示具有特定货币格式的数字,例如:
money="$6,150,593.22"
Run Code Online (Sandbox Code Playgroud)
我想将此字符串转换为数字
6150593.22
Run Code Online (Sandbox Code Playgroud)
实现这一目标的最佳方法是什么?
And*_*are 48
试试这个:
from re import sub
from decimal import Decimal
money = '$6,150,593.22'
value = Decimal(sub(r'[^\d.]', '', money))
Run Code Online (Sandbox Code Playgroud)
这有一些优点,因为它使用Decimal
而不是float
(这更好地代表货币),并且它还通过不对特定货币符号进行硬编码来避免任何区域设置问题.
And*_*ark 14
如果您的语言环境设置正确,您可以使用locale.atof
,但您仍需要手动剥离'$':
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF8')
'en_US.UTF8'
>>> money = "$6,150,593.22"
>>> locale.atof(money.strip("$"))
6150593.2199999997
Run Code Online (Sandbox Code Playgroud)
对于没有硬编码货币位置或符号的解决方案:
raw_price = "17,30 €"
import locale
locale.setlocale(locale.LC_ALL, 'fr_FR.UTF8')
conv = locale.localeconv()
raw_numbers = raw_price.strip(conv['currency_symbol'].decode('utf-8'))
amount = locale.atof(raw_numbers)
Run Code Online (Sandbox Code Playgroud)
扩展到括号中包含负数:
In [1]: import locale, string
In [2]: from decimal import Decimal
In [3]: n = ['$1,234.56','-$1,234.56','($1,234.56)', '$ -1,234.56']
In [4]: tbl = string.maketrans('(','-')
In [5]: %timeit -n10000 [locale.atof( x.translate(tbl, '$)')) for x in n]
10000 loops, best of 3: 31.9 æs per loop
In [6]: %timeit -n10000 [Decimal( x.translate(tbl, '$,)')) for x in n]
10000 loops, best of 3: 21 æs per loop
In [7]: %timeit -n10000 [float( x.replace('(','-').translate(None, '$,)')) for x in n]
10000 loops, best of 3: 3.49 æs per loop
In [8]: %timeit -n10000 [float( x.translate(tbl, '$,)')) for x in n]
10000 loops, best of 3: 2.19 æs per loop
Run Code Online (Sandbox Code Playgroud)
请注意,必须从float()/ Decimal()中删除逗号.替换()或translate()w /转换表可用于转换开头(转换为 - ,转换稍快.flora()最快10-15倍,但缺乏精度并可能出现语言环境问题.十进制( )具有精度,比locale.atof()快50%,但也有区域设置问题.locale.atof()是最慢的,但最常见的.
编辑:新str.translate
API(映射到None
从str.translate
函数移动到转换表的字符)
In [1]: import locale, string
from decimal import Decimal
locale.setlocale(locale.LC_ALL, '')
n = ['$1,234.56','-$1,234.56','($1,234.56)', '$ -1,234.56']
In [2]: tbl = str.maketrans('(', '-', '$)')
%timeit -n10000 [locale.atof( x.translate(tbl)) for x in n]
18 µs ± 296 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [3]: tbl2 = str.maketrans('(', '-', '$,)')
%timeit -n10000 [Decimal( x.translate(tbl2)) for x in n]
3.77 µs ± 50.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [4]: %timeit -n10000 [float( x.translate(tbl2)) for x in n]
3.13 µs ± 66.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [5]: tbl3 = str.maketrans('', '', '$,)')
%timeit -n10000 [float( x.replace('(','-').translate(tbl3)) for x in n]
3.51 µs ± 84.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Run Code Online (Sandbox Code Playgroud)
我发现该babel
软件包非常有助于解决
它可以轻松解析本地化格式中的数字:
>>> babel.numbers.parse_decimal('1,024.64', locale='en')
Decimal('1024.64')
>>> babel.numbers.parse_decimal('1.024,64', locale='de')
Decimal('1024.64')
>>>
Run Code Online (Sandbox Code Playgroud)
您可以使用babel.numbers.get_currency_symbol('USD')
去除前缀/后缀而不对其进行硬编码。
hth, dtk