pandas:带正则表达式的Dataframe.replace()

Boo*_*d16 1 python string floating-point replace pandas

我有一个看起来像这样的表:

df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-'])))

    A       B
0   1.00    1.0
1   -1      -45.00
2   NaN     -
Run Code Online (Sandbox Code Playgroud)

我想使用dataframe.replace()将' - '替换为'0.00',但由于负值'-1',' - 45.00'而挣扎.

如何忽略负值并仅将' - '替换为'0.00'?

我的代码:

df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64)
Run Code Online (Sandbox Code Playgroud)

错误代码:

ValueError: invalid literal for float(): 0.0045.00
Run Code Online (Sandbox Code Playgroud)

EdC*_*ica 6

你的正则表达式匹配所有-字符:

In [48]:
df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True)

Out[48]:
       A          B
0   1.00        1.0
1  0.001  0.0045.00
2    NaN       0.00
Run Code Online (Sandbox Code Playgroud)

如果你添加额外的边界,使它只匹配具有终止的单个字符,那么它按预期工作:

In [47]:
df_raw.replace(['^-$'], ['0.00'], regex=True)

Out[47]:
      A       B
0  1.00     1.0
1    -1  -45.00
2   NaN    0.00
Run Code Online (Sandbox Code Playgroud)

^意味着字符串的开头,表示字符串的$结尾,因此它只匹配该单个字符.

或者您可以使用replace仅匹配完全匹配的内容:

In [29]:

df_raw.replace('-',0)
Out[29]:
      A       B
0  1.00     1.0
1    -1  -45.00
2   NaN       0
Run Code Online (Sandbox Code Playgroud)