搜索多个字符串以查找多个单词

N08*_*N08 5 python dataframe pandas

我有一个包含每行句子的数据帧.我需要在这些句子中搜索某些单词的出现.这就是我目前的做法:

import pandas as pd

p = pd.DataFrame({"sentence" : ["this is a test", "yet another test", "now two tests", "test a", "no test"]})

test_words = ["yet", "test"]
p["word_test"] = ""
p["word_yet"]  = ""

for i in range(len(p)):
    for word in test_words:
        p.loc[i]["word_"+word] = p.loc[i]["sentence"].find(word)
Run Code Online (Sandbox Code Playgroud)

这可以按预期工作,但是,是否可以对此进行优化?对于大型数据帧,它运行速度相当慢

cs9*_*s95 5

IIUC,使用简单的列表理解并调用str.find每个单词:

u = pd.DataFrame({
    # 'word_{}'.format(w)
    f'word_{w}': df.sentence.str.find(w) for w in test_words}, index=df.index)
u
   word_yet  word_test
0        -1         10
1         0         12
2        -1          8
3        -1          0
4        -1          3
Run Code Online (Sandbox Code Playgroud)
pd.concat([df, u], axis=1)

           sentence  word_yet  word_test
0    this is a test        -1         10
1  yet another test         0         12
2     now two tests        -1          8
3            test a        -1          0
4           no test        -1          3
Run Code Online (Sandbox Code Playgroud)


Vai*_*ali 5

你可以使用str.find

p['word_test'] = p.sentence.str.find('test')
p['word_yet'] = p.sentence.str.find('yet')

    sentence         word_test  word_yet    word_yest
0   this is a test   10         -1          -1
1   yet another test 12          0          0
2   now two tests    8          -1          -1
3   test a           0          -1          -1
4   no test          3          -1          -1
Run Code Online (Sandbox Code Playgroud)


WeN*_*Ben 5

因为你提到了更好的性能 np.char.find

df=pd.DataFrame(data=[np.char.find(p.sentence.values.astype(str),x) for x in test_words],index=test_words,columns=p.index)
pd.concat([p,df.T],axis=1)
Out[32]: 
           sentence  yet  test
0    this is a test   -1    10
1  yet another test    0    12
2     now two tests   -1     8
3            test a   -1     0
4           no test   -1     3
Run Code Online (Sandbox Code Playgroud)