返回pandas列中存在的多个单词的计数

Ent*_*ast 2 python string pandas

我有一个像下面这样的pandas数据框,列名为'texts'

texts
throne one
bar one
foo two
bar three
foo two
bar two
foo one
foo three
one three
Run Code Online (Sandbox Code Playgroud)

我想计算每一行的三个单词'one'和'two'和'three'的存在,并返回这些单词的匹配计数,如果它是一个完整的单词.输出如下所示.

    texts   counts
    throne one  1
    bar one     1
    foo two     1
    bar three   1
    foo two     1
    bar two     1
    foo one     1
    foo three   1
    one three   2
Run Code Online (Sandbox Code Playgroud)

你可以看到,比第一行,count是1因为'宝座'不被认为是被搜索的值之一'一'不是一个完整的单词而是它是'宝座'.

对此有何帮助?

piR*_*red 7

使用pd.Series.str.count通过加入用正则表达式words'|'

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(words)))

        texts  counts
0  throne one       2
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2
Run Code Online (Sandbox Code Playgroud)

为了确定'throne',我们可以为正则表达式添加一些单词边界

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(map(r'\b{}\b'.format, words))))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2
Run Code Online (Sandbox Code Playgroud)

对于天赋,在Python 3.6中使用原始形式的f字符串

words = 'one two three'.split()

df.assign(counts=df.texts.str.count('|'.join(fr'\b{w}\b' for w in words)))

        texts  counts
0  throne one       1
1     bar one       1
2     foo two       1
3   bar three       1
4     foo two       1
5     bar two       1
6     foo one       1
7   foo three       1
8   one three       2
Run Code Online (Sandbox Code Playgroud)