Lea*_*ner 5 python python-3.x pandas
我有一个熊猫数据框:
word_list
['nuclear','election','usa','baseball']
['football','united','thriller']
['marvels','hollywood','spiderman']
....................
....................
....................
Run Code Online (Sandbox Code Playgroud)
我还有多个带有类别名称的列表,例如:-
movies=['spiderman','marvels','thriller']'
sports=['baseball','hockey','football'],
politics=['election','china','usa'] 和许多其他类别。
所有我想将 pandas 列的关键字word_list与我的类别列表相匹配,并在单独的列中分配相应的列表名称,如果关键字匹配在一起,并且如果任何关键字在任何列表中都没有匹配,那么只需将其作为 miscellaneous所以,输出我寻找为:-
word_list matched_list_names
['nuclear','election','usa','baseball'] politics,sports,miscellaneous
['football','united','thriller'] sports,movies,miscellaneous
['marvels','spiderman','hockey'] movies,sports
.................... .....................
.................... .....................
.................... ....................
Run Code Online (Sandbox Code Playgroud)
我成功地获得了匹配关键字:-
for i in df['word_list']:
for j in movies:
if i in j:
print (i)
Run Code Online (Sandbox Code Playgroud)
但这给了我匹配关键字的列表。如何获取列表名称并将其添加到 pandas 列中?
您可以先展平列表字典,然后通过.getwith查找miscellaneous不匹配的值,然后转换为sets 以获取唯一类别,并通过以下方式转换为strings join:
movies=['spiderman','marvels','thriller']
sports=['baseball','hockey','football']
politics=['election','china','usa']
d = {'movies':movies, 'sports':sports, 'politics':politics}
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
f = lambda x: ','.join(set([d1.get(y, 'miscellaneous') for y in x]))
df['matched_list_names'] = df['word_list'].apply(f)
print (df)
word_list matched_list_names
0 [nuclear, election, usa, baseball] politics,miscellaneous,sports
1 [football, united, thriller] miscellaneous,sports,movies
2 [marvels, hollywood, spiderman, budget] miscellaneous,movies
Run Code Online (Sandbox Code Playgroud)
与列表理解类似的解决方案:
df['matched_list_names'] = [','.join(set([d1.get(y, 'miscellaneous') for y in x]))
for x in df['word_list']]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
606 次 |
| 最近记录: |