Pandas：将包含“10%”和“0.10”等字符串的系列转换为数字

Question

Pandas：将包含“10%”和“0.10”等字符串的系列转换为数字

dum*_*umb 7 python string format number-formatting pandas

将包含“10%”和“0.10”类型字符串的 Pandas 系列转换为数值的最佳方法是什么？

我知道如果我有一个只有“0.10”类型字符串的系列，我可以这样做pd.to_numeric。

我也知道，如果我有一系列“10%”类型的字符串，我可以做str.replace("%","")然后做pd.to_numeric除以 100。

我遇到的问题是混合了“0.10”和“10%”类型字符串的系列。如何最好地将其转换为具有正确数字类型的系列。

我想我可以通过首先使用 True / False 制作一个临时系列来实现，具体取决于字符串中是否包含“%”，然后基于应用函数。但这似乎效率很低。

有没有更好的办法？

我尝试过的供参考：

mixed = pd.Series(["10%","0.10","5.5%","0.02563"])
mixed.str.replace("%","").astype("float")/100

0    0.100000
1    0.001000
2    0.055000
3    0.000256
dtype: float64
# This doesn't work, because even the 0.10 and 0.02563 are divided by 100.

Run Code Online (Sandbox Code Playgroud)

Answer 1

Rab*_*zel 8

无论如何，你需要一个条件。这是一种可能的方式：

l = pd.Series((float(x.strip('%'))/100 if '%' in x else float(x) for x in mixed))
print(l)

0    0.10000
1    0.10000
2    0.05500
3    0.02563
dtype: float64

Run Code Online (Sandbox Code Playgroud)

Answer 2

Sul*_*yev 8

基于这个答案的一个非常巧妙的解决方案是：

from pandas import Series, to_numeric

mixed = Series(["10%", "0.10", "5.5%", "0.02563"])

print(to_numeric(mixed.str.replace("%", "e-2")))
# 0    0.10000
# 1    0.10000
# 2    0.05500
# 3    0.02563
# dtype: float64

Run Code Online (Sandbox Code Playgroud)

Answer 3

Sul*_*yev 5

最简单的解决方案是使用掩码选择条目并批量处理它们：

from pandas import Series, to_numeric

mixed = Series(["10%", "0.10", "5.5%", "0.02563"])

# make an empty series with similar shape and dtype float
converted = Series(index=mixed.index, dtype='float')

# use a mask to select specific entries
mask = mixed.str.contains("%")

converted.loc[mask] = to_numeric(mixed.loc[mask].str.replace("%", "")) / 100
converted.loc[~mask] = to_numeric(mixed.loc[~mask])

print(converted)
# 0    0.10000
# 1    0.10000
# 2    0.05500
# 3    0.02563
# dtype: float64

Run Code Online (Sandbox Code Playgroud)

Answer 4

BeR*_*2me 5

mixed = mixed.apply(lambda x: float(x[:-1])/100 if '%' in x else float(x))

Run Code Online (Sandbox Code Playgroud)

输出：

0    0.10000
1    0.10000
2    0.05500
3    0.02563
dtype: float64

Run Code Online (Sandbox Code Playgroud)

归档时间：	3 年，3 月前
查看次数：	411 次
最近记录：	3 年，2 月前