在熊猫数据框上应用正则表达式函数

Question

在熊猫数据框上应用正则表达式函数

Mic*_*l N 3 python regex data-manipulation dataframe pandas

我在熊猫中有一个数据框，例如：

0                       1                   2
([0.8898668778942382    0.89533945283595]   0)
([1.2632564814188714    1.0207660696232244] 0)
([1.006649166957976     1.1180973832359227] 0)
([0.9653632916751714    0.8625538463644129] 0)
([1.038366333873932     0.9091449796555554] 0)

Run Code Online (Sandbox Code Playgroud)

所有值都是字符串。我想删除所有特殊字符并转换为双精度。我想应用一个函数来删除所有特殊字符，除了点像

import re
re.sub('[^0-9.]+', '',x)

Run Code Online (Sandbox Code Playgroud)

所以我想在数据帧的所有单元格中应用它。我该怎么做？我找到了 df.applymap 函数，但我不知道如何将字符串作为参数传递。我试过

def remSp(x): 
    re.sub('^[0-9]+', '',x)

df.applymap(remSp())

Run Code Online (Sandbox Code Playgroud)

但我不知道如何将单元格传递给函数。有没有更好的方法来做到这一点？

谢谢

Answer 1

Flo*_*oor 6

为什么不能直接用正则表达式在 df 上使用默认的替换方法，即

df = df.replace('[^\d.]', '',regex=True).astype(float)

Run Code Online (Sandbox Code Playgroud)

          0 1 2
0 0.889867 0.895339 0.0
1 1.263256 1.020766 0.0
2 1.006649 1.118097 0.0
3 0.965363 0.862554 0.0
4 1.038366 0.909145 0.0

这仍然比其他答案快。

归档时间：	8 年，6 月前
查看次数：	5951 次
最近记录：	8 年，6 月前