Python Pandas df，将货币中的 $、M 和 K 替换为 int 的最佳方法

Question

Python Pandas df，将货币中的 $、M 和 K 替换为 int 的最佳方法

Lui*_*Gan 3 python regex dataframe pandas

我正在做一个个人项目来练习熊猫和美丽的汤，我抓取了这个信息并将它放在一个熊猫 df 中，如下所示：

0        €8.5M
1           €0
2        €9.5M
3          €2M
4         €21M
         ...  
16534    €1.8M
16535    €1.1M
16536    €550K
16537    €650K
16538    €1.1M
Name: Value, Length: 16539, dtype: object
0        €67K
1          €0
2        €15K
3        €11K
4        €13K
         ... 
16534     €3K
16535     €2K
16536     €2K
16537     €7K
16538     €3K
Name: Wage, Length: 16539, dtype: object

Run Code Online (Sandbox Code Playgroud)

所以为了分析这些信息，我想清理这些数据并将其转换为整数，我能想到的是：

df['Wage'] = df['Wage'].apply(lambda x: re.sub('€','',x))
df['Wage'] = df['Wage'].apply(lambda x: re.sub('K','000',x))

df['Value'] = df['Value'].apply(lambda x: re.sub('€','',x))
df['Value'] = df['Value'].apply(lambda x : re.sub('M','00000',x) if (('M' in x) and ('.' in x))else x)
df['Value'] = df['Value'].apply(lambda x : re.sub('[.]','',x))
df['Value'] = df['Value'].apply(lambda x : re.sub('M','000000',x))
df['Value'] = df['Value'].apply(lambda x : re.sub('K','000',x))

df['Wage'] = df['Wage'].astype(int)
df['Value'] = df['Value'].astype(int)

Run Code Online (Sandbox Code Playgroud)

我首先替换了货币符号，然后检查点，以便我可以将 M 替换为 5 个零，然后将剩余的 M 替换为 6 个零，然后将 K 替换为 3 个零，然后我将类型更改为 int。但我觉得这不是一个好方法，你怎么看？这样做的更好方法是什么？我尝试创建一个函数，但无法做到。

Answer 1

Sea*_*ean 5

更新的解决方案：

新解决方案：使用.replace()且astype()仅。
不依赖于pd.eval公式评估：

您可以将M,K转换为指数格式的相应幅度：

K转换为e+03科学记数法

M转换为e+06科学记数法

（支持integer以及float数字中任意小数位数）

然后，将科学记数法中的文本转换为浮点类型，然后转换为最终所需格式的整数，如下所示：

df['Value'] = df['Value'].replace({'€': '', ' ': '', 'M': 'e+06', 'K': 'e+03'}, regex=True).astype(float).astype(int)

Run Code Online (Sandbox Code Playgroud)

输入数据：

         Value
0        €8.5M
1           €0
2        €9.5M
3          €2M
4         €21M
16534    €1.8M
16535    €1.1M
16536    €550K
16537    €650K
16538    €1.1M

Run Code Online (Sandbox Code Playgroud)

输出：

print(df)

          Value
0       8500000
1             0
2       9500000
3       2000000
4      21000000
16534   1800000
16535   1100000
16536    550000
16537    650000
16538   1100000

Run Code Online (Sandbox Code Playgroud)

旧解决方案：

您可以将M,转换K为公式，然后用于pd.eval计算数值。

K 转换为公式 * 1000

M 转换为公式 * 1000000

通过这种方式，我们可以支持带有任意数量小数点的基值（带或不带小数点以及小数部分可以有多长）。我们可以从小数点后所有小数部分长度的公式中得到正确的结果。

df['Value'] = df['Value'].str.replace('€', '')
df['Value'] = df['Value'].str.replace('M', ' * 1000000')
df['Value'] = df['Value'].str.replace('K', ' * 1000')
df['Value'] = df['Value'].map(pd.eval).astype(int)

Run Code Online (Sandbox Code Playgroud)

或者在一行中简化代码，感谢@MustafaAyd?n 的建议：

df['Value'] = df['Value'].replace({"€": "", "M": "*1E6", "K": "*1E3"}, regex=True).map(pd.eval).astype(int)

Run Code Online (Sandbox Code Playgroud)

结果：

print(df)


          Value
0       8500000
1             0
2       9500000
3       2000000
4      21000000
16534   1800000
16535   1100000
16536    550000
16537    650000
16538   1100000

Run Code Online (Sandbox Code Playgroud)

输入样本数据如下：

         Value
0        €8.5M
1           €0
2        €9.5M
3          €2M
4         €21M
16534    €1.8M
16535    €1.1M
16536    €550K
16537    €650K
16538    €1.1M

Run Code Online (Sandbox Code Playgroud)

在最后一步之前，我们得到：

               Value
0      8.5 * 1000000
1                  0
2      9.5 * 1000000
3        2 * 1000000
4       21 * 1000000
16534  1.8 * 1000000
16535  1.1 * 1000000
16536     550 * 1000
16537     650 * 1000
16538  1.1 * 1000000

Run Code Online (Sandbox Code Playgroud)

然后我们将它提供pd.eval给它来评估并转换为数值（浮点数），我们可以进一步将其转换为整数。

+1；如果您愿意，可以用字典进行替换 `df.Value.replace({"€": "", "M": "*1E6", "K": "*1E3"}, regex=True)` (并且将 `pd.eval` 包裹起来）；但也许你的更具可读性。 (2认同)

归档时间：	4 年，8 月前
查看次数：	77 次
最近记录：	4 年，7 月前