列表元素与 pandas 列的关键字匹配

Lea*_*ner 3 python dataframe python-3.x pandas

我有元素列表:

 A=  ['loans','s-class','veyron','trump','rihana','drake','election']
Run Code Online (Sandbox Code Playgroud)

我也有另一只大熊猫数据框B与列categorywords是逗号分隔字符串: -

category              words
audi                  a4, a6
bugatti               veyron, chiron
mercedez              s-class, e-class
dslr                  canon, nikon
apple                 iphone,macbook,ipod
finance               sales,loans,sales price
politics              trump, election, votes
entertainment         spiderman,thor, ironmen
music                 beiber, rihana,drake
........              ..............
.........             .........
Run Code Online (Sandbox Code Playgroud)

我只想A用列映射列表元素words并将相应的分配category到新列表中。所以,预期的输出是。

matched_categories=['finance','mercedez','bugatti','politics','music','music','politics']
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 5

通过过滤器boolean indexingiat用于选择第一个匹配的值:

#if always matched all values
matched_categories = [df.loc[df['words'].str.contains(x), 'category'].iat[0] for x in A]
print (matched_categories)
['finance', 'mercedez', 'bugatti', 'politics', 'music', 'music', 'politics']
Run Code Online (Sandbox Code Playgroud)

如果某些值不匹配,则更通用的解决方案 - 然后返回not matched值:

#added last aaa value
A = ['loans','s-class','veyron','trump','rihana','drake','election','aaa']

matched_categories = [next(iter(df.loc[df['words'].str.contains(x),'category']),'not matched')
                      for x in A]
print (matched_categories)
['finance', 'mercedez', 'bugatti', 'politics', 'music', 'music', 'politics', 'not matched']
Run Code Online (Sandbox Code Playgroud)