Til*_*ilo 6 python python-3.x pandas
我有包含 7 个电话号码条目的数据框 df,我想创建新的重命名列,例如 ph1 .. ph7 并用电话号码的清理值填充它们,即删除空格、“/”、“-”、“+”等。
有了 R ,我可以轻松使用 lapply 有没有办法在 Python 中做同样的事情?我知道 do.call() 可以做同样的事情,但面临编码相同的问题
con_1 <- con[, c("ph1", "ph2", "ph3", "ph4", "ph5", "ph6", "ph7") :=
lapply(.SD, function(x) { gsub(paste(unlist(list(" ", "/", "-", "+")), collapse = "|"), replace = "", x) }),
.SDcols = c("phone1", "phone2", "phone3", "phone4", "phone5", "phone6", "phone7")]
Run Code Online (Sandbox Code Playgroud)
数据帧 con 是:
kac play_id phone1 phone2 phone3 phone4 phone5 phone6 phone7
1: 5004490 20002075 0900031349 090891349 <NA> <NA> <NA> <NA> <NA>
2: 5003807 00601731 <NA> <NA> <NA> <NA> 088235311 <NA> <NA>
Run Code Online (Sandbox Code Playgroud)
我需要上面的Python等效项
假设您有以下数据框(与您的数据框完全不同,因为您的数据框不会更新任何内容):
# import module
import pandas as pd
# define data frame
df = pd.DataFrame(
[["5004490", "20002075", "09-00-03-13-49", "090891349", "", "", "", "", ""],
["5003807", "00601731", "", "", "", "", "08+82+35+31/1", "", ""],
["5003808", "00601731", "", "", "", "", "", "", "08/82/35/31/1"]],
columns=['kac', 'play_id', 'phone1','phone2', 'phone3', 'phone4', 'phone5','phone6', 'phone7']
)
# Display
print(df)
# kac play_id phone1 phone2 phone3 phone4 phone5 phone6 phone7
# 0 5004490 20002075 09-00-03-13-49 090891349
# 1 5003807 00601731 08+82+35+31/1
# 2 5003808 00601731 08/82/35/31/1
Run Code Online (Sandbox Code Playgroud)
您可以定义一个函数来应用于每个单元格。applymap
做这份工作。这里我定义了一个函数clean_up_df
来删除+
,-
和/
:
def clean_up_df(data):
rep = data.replace('/', '') # Replace '/' by ''
rep = rep.replace('-', '') # Replace '-' by ''
rep = rep.replace('+', '') # Replace '+' by ''
return rep
# Columns to process
phone_columns = ['phone1', 'phone2', 'phone3',
'phone4', 'phone5', 'phone6', 'phone7']
# Processing the function clean_up_df
df[phone_columns] = df[phone_columns].applymap(clean_up_df)
# Display
print(df)
# kac play_id phone1 phone2 phone3 phone4 phone5 phone6 phone7
# 0 5004490 20002075 0900031349 090891349
# 1 5003807 00601731 088235311
# 2 5003808 00601731 088235311
Run Code Online (Sandbox Code Playgroud)
现在,如果您想处理特定的列,您可以使用apply
with axis=1
meaning:将此函数应用于数据帧的每一行。这里有一个例子:
# column to proceed
phone_col_name = "phone1"
# Same function with the column specified
def clean_up(data):
rep = data[phone_col_name].replace('/', '')
rep = rep.replace('-', '')
rep = rep.replace('+', '')
return rep
# Process
df[phone_col_name] = df.apply(clean_up, axis=1)
# Display
print(df)
# kac play_id phone1 phone2 phone3 phone4 phone5 phone6 phone7
# 0 5004490 20002075 0900031349 090891349
# 1 5003807 00601731 08+82+35+31/1
# 2 5003808 00601731 08/82/35/31/1
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
17063 次 |
最近记录: |