Joh*_*Doe 6 python dataframe pandas
我有两个数据框:
df:
id string_data
1 My name is Jeff
2 Hello, I am John
3 I like Brad he is cool.
Run Code Online (Sandbox Code Playgroud)
另一个名为的数据框allnames包含这样的名称列表:
id name
1 Jeff
2 Brad
3 John
4 Emily
5 Ross
Run Code Online (Sandbox Code Playgroud)
我想将df其中出现的所有单词替换allnames['name']为"Firstname"
预期产量:
id string_data
1 My name is Firstname
2 Hello, I am Firstname
3 I like Firstname he is cool.
Run Code Online (Sandbox Code Playgroud)
我尝试了这个:
nameList = '|'.join(allnames['name'])
df['string_data'].str.replace(nameList, "FirstName", case = False))
Run Code Online (Sandbox Code Playgroud)
但它取代了几乎99%的单词
Your solution should working if add words boundaries to Series.str.replace:
nameList = '|'.join(r"\b{}\b".format(x) for x in allnames['name'])
df['string_data'] = df['string_data'].str.replace(nameList, "FirstName", case = False)
print (df)
id string_data
0 1 My name is FirstName
1 2 Hello, I am FirstName
2 3 I like FirstName he is cool.
Run Code Online (Sandbox Code Playgroud)
Or replace values with get and join by dictionary:
d = dict.fromkeys(allnames['name'], 'Firstname')
f = lambda x: ' '.join(d.get(y, y) for y in x.split())
df['string_data'] = df['string_data'].apply(f)
print (df)
id string_data
0 1 My name is Firstname
1 2 Hello, I am Firstname
2 3 I like Firstname he is cool.
Run Code Online (Sandbox Code Playgroud)
EDIT: You can convert all values to lowercase by lower:
d = dict.fromkeys([x.lower() for x in allnames['name']], 'Firstname')
f = lambda x: ' '.join(d.get(y.lower(), y) for y in x.split())
df['string_data'] = df['string_data'].apply(f)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
48 次 |
| 最近记录: |