Tao*_*Han 6 python string nlp dataframe pandas
我有一个如下所示的数据框:
data = {'speaker':['Adam','Ben','Clair'],
'speech': ['Thank you very much and good afternoon.',
'Let me clarify that because I want to make sure we have got everything right',
'By now you should have some good rest']}
df = pd.DataFrame(data)
Run Code Online (Sandbox Code Playgroud)
我想计算语音列中的单词数,但只计算预定义列表中的单词。例如,列表是:
wordlist = ['much', 'good','right']
Run Code Online (Sandbox Code Playgroud)
我想生成一个新列,显示每行中这三个单词的频率。因此,我的预期输出是:
speaker speech words
0 Adam Thank you very much and good afternoon. 2
1 Ben Let me clarify that because I want to make sur... 1
2 Clair By now you should have received a copy of our ... 1
Run Code Online (Sandbox Code Playgroud)
我试过:
df['total'] = 0
for word in df['speech'].str.split():
if word in wordlist:
df['total'] += 1
Run Code Online (Sandbox Code Playgroud)
但是我运行它后,该total列始终为零。我想知道我的代码有什么问题?
您可以使用以下矢量化方法:
data = {'speaker':['Adam','Ben','Clair'],
'speech': ['Thank you very much and good afternoon.',
'Let me clarify that because I want to make sure we have got everything right',
'By now you should have some good rest']}
df = pd.DataFrame(data)
wordlist = ['much', 'good','right']
df['total'] = df['speech'].str.count(r'\b|\b'.join(wordlist))
Run Code Online (Sandbox Code Playgroud)
这使:
>>> df
speaker speech total
0 Adam Thank you very much and good afternoon. 2
1 Ben Let me clarify that because I want to make sur... 1
2 Clair By now you should have some good rest 1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
803 次 |
| 最近记录: |