将 pandas 列中的关键字与另一个元素列表匹配

Lea*_*ner 5 python python-3.x pandas

我有一个熊猫数据框:

word_list
['nuclear','election','usa','baseball']
['football','united','thriller']
['marvels','hollywood','spiderman']
....................
....................
....................
Run Code Online (Sandbox Code Playgroud)

我还有多个带有类别名称的列表,例如:-

movies=['spiderman','marvels','thriller']'

sports=['baseball','hockey','football'],

politics=['election','china','usa'] 和许多其他类别。

所有我想将 pandas 列的关键字word_list与我的类别列表相匹配,并在单独的列中分配相应的列表名称,如果关键字匹配在一起,并且如果任何关键字在任何列表中都没有匹配,那么只需将其作为 miscellaneous所以,输出我寻找为:-

word_list                                          matched_list_names
['nuclear','election','usa','baseball']            politics,sports,miscellaneous
['football','united','thriller']                   sports,movies,miscellaneous               
['marvels','spiderman','hockey']                   movies,sports

....................                               .....................
....................                               .....................
....................                               ....................
Run Code Online (Sandbox Code Playgroud)

我成功地获得了匹配关键字:-

for i in df['word_list']:
    for j in movies:
        if i in j:
           print (i)
Run Code Online (Sandbox Code Playgroud)

但这给了我匹配关键字的列表。如何获取列表名称并将其添加到 pandas 列中?

jez*_*ael 3

您可以先展平列表字典,然后通过.getwith查找miscellaneous不匹配的值,然后转换为sets 以获取唯一类别,并通过以下方式转换为strings join

movies=['spiderman','marvels','thriller']
sports=['baseball','hockey','football']
politics=['election','china','usa']
d = {'movies':movies, 'sports':sports, 'politics':politics}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}

f = lambda x: ','.join(set([d1.get(y, 'miscellaneous') for y in x]))
df['matched_list_names'] = df['word_list'].apply(f)
print (df)

                                 word_list             matched_list_names
0       [nuclear, election, usa, baseball]  politics,miscellaneous,sports
1             [football, united, thriller]    miscellaneous,sports,movies
2  [marvels, hollywood, spiderman, budget]           miscellaneous,movies
Run Code Online (Sandbox Code Playgroud)

与列表理解类似的解决方案:

df['matched_list_names'] = [','.join(set([d1.get(y, 'miscellaneous') for y in x])) 
                            for x in df['word_list']]
Run Code Online (Sandbox Code Playgroud)