应用正则表达式替换值的pandas

Question

应用正则表达式替换值的pandas

我已经将一些定价数据读入了pandas数据框,其值显示为:

$40,000*
$40000 conditions attached

Run Code Online (Sandbox Code Playgroud)

我想将其剥离为数值.我知道我可以循环并应用正则表达式

[0-9]+

Run Code Online (Sandbox Code Playgroud)

到每个字段然后将结果列表重新加入,但是有一种不循环的方式吗？

谢谢

Answer 1

unu*_*tbu 78

你可以使用Series.str.replace:

import pandas as pd

df = pd.DataFrame(['$40,000*','$40000 conditions attached'], columns=['P'])
print(df)
#                             P
# 0                    $40,000*
# 1  $40000 conditions attached

df['P'] = df['P'].str.replace(r'\D+', '').astype('int')
print(df)

Run Code Online (Sandbox Code Playgroud)

产量

       P
0  40000
1  40000

Run Code Online (Sandbox Code Playgroud)

因为\D匹配任何非十进制数字.

Answer 2

Plu*_*uto 14

你可以使用pandas的替换方法; 你也可以保留千位分隔符','和小数位分隔符'.'

import pandas as pd

df = pd.DataFrame(['$40,000.32*','$40000 conditions attached'], columns=['pricing'])
df['pricing'].replace(to_replace="\$([0-9,\.]+).*", value=r"\1", regex=True, inplace=True)
print(df)
pricing
0  40,000.32
1      40000

Run Code Online (Sandbox Code Playgroud)

Answer 3

Jer*_*rry 13

您可以使用以下命令删除所有非数字re.sub():

value = re.sub(r"[^0-9]+", "", value)

Run Code Online (Sandbox Code Playgroud)

regex101演示

好吧我想我得到它用于熊猫使用:df ['Pricing'].replace(to_replace ='[^ 0-9] +',value ='',inplace == True,regex = True).replace方法使用应用re.sub (20认同)
@KillerSnail您的解决方案需要进行一项更正：就地后的双等号（==）应替换为单等号（=） df['Pricing'].replace（to_replace='[^0-9]+', value=' '，就地=真，正则表达式=真） (3认同)
将其应用到数据框中的列的最佳方法是什么？所以我有 df['pricing'] 我只是逐行循环吗？ (2认同)
注意-删除所有非数字符号将删除负号小数点，并将不相关的数字连接在一起，例如，“ $ 8.99但凭优惠券减$ 2”变为“ 8992”，“ $ 5.99”变为“ 499”，“ $ 5”变为“ 5”。 (2认同)

Answer 4

sam*_*and 5

你不需要正则表达式.这应该工作:

df['col'] = df['col'].astype(str).convert_objects(convert_numeric=True)

归档时间：	11 年，10 月前
查看次数：	57672 次
最近记录：	8 年，10 月前