Ent*_*ast 2 python string pandas
我有一个像下面这样的pandas数据框,列名为'texts'
texts
throne one
bar one
foo two
bar three
foo two
bar two
foo one
foo three
one three
Run Code Online (Sandbox Code Playgroud)
我想计算每一行的三个单词'one'和'two'和'three'的存在,并返回这些单词的匹配计数,如果它是一个完整的单词.输出如下所示.
texts counts
throne one 1
bar one 1
foo two 1
bar three 1
foo two 1
bar two 1
foo one 1
foo three 1
one three 2
Run Code Online (Sandbox Code Playgroud)
你可以看到,比第一行,count是1因为'宝座'不被认为是被搜索的值之一'一'不是一个完整的单词而是它是'宝座'.
对此有何帮助?
使用pd.Series.str.count通过加入用正则表达式words与'|'
words = 'one two three'.split()
df.assign(counts=df.texts.str.count('|'.join(words)))
texts counts
0 throne one 2
1 bar one 1
2 foo two 1
3 bar three 1
4 foo two 1
5 bar two 1
6 foo one 1
7 foo three 1
8 one three 2
Run Code Online (Sandbox Code Playgroud)
为了确定'throne',我们可以为正则表达式添加一些单词边界
words = 'one two three'.split()
df.assign(counts=df.texts.str.count('|'.join(map(r'\b{}\b'.format, words))))
texts counts
0 throne one 1
1 bar one 1
2 foo two 1
3 bar three 1
4 foo two 1
5 bar two 1
6 foo one 1
7 foo three 1
8 one three 2
Run Code Online (Sandbox Code Playgroud)
对于天赋,在Python 3.6中使用原始形式的f字符串
words = 'one two three'.split()
df.assign(counts=df.texts.str.count('|'.join(fr'\b{w}\b' for w in words)))
texts counts
0 throne one 1
1 bar one 1
2 foo two 1
3 bar three 1
4 foo two 1
5 bar two 1
6 foo one 1
7 foo three 1
8 one three 2
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
39 次 |
| 最近记录: |