我有一个熊猫系列的字符串.我想对每行的多个子串进行多次替换,请参阅:
testdf = pd.Series([
'Mary went to school today',
'John went to hospital today'
])
to_sub = {
'Mary': 'Alice',
'school': 'hospital',
'today': 'yesterday',
'tal': 'zzz',
}
testdf = testdf.replace(to_sub, regex=True) # does not work (only replaces one instance per row)
print(testdf)
Run Code Online (Sandbox Code Playgroud)
在上面的例子中,所需的输出是:
Alice went to hospital yesterday.
John went to hospizzz yesterday.
Run Code Online (Sandbox Code Playgroud)
注意第一行有三个字典替换.
除了逐行(在for循环中)之外,我怎样才能有效地执行此操作?
我df.replace(...)在其他问题中尝试了许多其他答案,但只替换了一个子字符串,结果如下:Alice went to school today,在哪里school和today没有被替换..
另外要注意的是,替代应该发生一次全部用于任何单行.(参见hospital第一行中是未被取代的一第二时间hospizzz这将是错误的).
您可以使用:
#Borrowed from an external website
def multipleReplace(text, wordDict):
for key in wordDict:
text = text.replace(key, wordDict[key])
return text
print(testdf.apply(lambda x: multipleReplace(x,to_sub)))
0 Alice went to hospital yesterday
1 John went to hospital yesterday
Run Code Online (Sandbox Code Playgroud)
编辑
使用字典如下所述:
to_sub = {
'Mary': 'Alice',
'school': 'hospital',
'today': 'yesterday',
'tal': 'zzz'
}
testdf.apply(lambda x: ' '.join([to_sub.get(i, i) for i in x.split()]))
Run Code Online (Sandbox Code Playgroud)
输出:
0 Alice went to hospital yesterday
1 John went to hospital yesterday
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
483 次 |
| 最近记录: |