我想创建一个列 df['score']
,返回单元格和列表之间的共同值计数。
输入:
correct_list = ['cats','dogs']
answer
0 cats, dogs, pigs
1 cats, dogs
2 dogs, pigs
3 cats
4 pigs
def animal_count(dataframe):
count = 0
for term in df['answer']:
if term in symptom_list:
df['score'] = count + 1
animal_count(df)
Run Code Online (Sandbox Code Playgroud)
预期输出:
correct_list = ['cats','dogs']
answer score
0 cats, dogs, pigs 2
1 cats, dogs 2
2 dogs, pigs 1
3 cats 1
4 pigs 0
Run Code Online (Sandbox Code Playgroud)
有任何想法吗?谢谢!
使用的另一种解决方案Series.str.count
:
df['score'] = df['answer'].str.count('|'.join(correct_list))
Run Code Online (Sandbox Code Playgroud)
[出去]
answer score
0 cats, dogs, pigs 2
1 cats, dogs 2
2 dogs, pigs 1
3 cats 1
4 pigs 0
Run Code Online (Sandbox Code Playgroud)
正如@PrinceFrancis 所指出的,如果catsdogs
不应该算作2
,那么您可以更改您的正则表达式模式以适应:
df = pd.DataFrame({'answer': ['cats, dogs, pigs', 'cats, dogs', 'dogs, pigs', 'cats', 'pigs', 'catsdogs']})
pat = '|'.join([fr'\b{x}\b' for x in correct_list])
df['score'] = df['answer'].str.count(pat)
Run Code Online (Sandbox Code Playgroud)
[出去]
answer score
0 cats, dogs, pigs 2
1 cats, dogs 2
2 dogs, pigs 1
3 cats 1
4 pigs 0
5 catsdogs 0
Run Code Online (Sandbox Code Playgroud)
我们还可以使用Series.explode
:
df['score']=df['answer'].str.split(', ').explode().isin(correct_list).groupby(level=0).sum()
print(df)
answer score
0 cats, dogs, pigs 2.0
1 cats, dogs 2.0
2 dogs, pigs 1.0
3 cats 1.0
4 pigs 0.0
Run Code Online (Sandbox Code Playgroud)