Lea*_*ner 3 python dataframe python-3.x pandas
我有元素列表:
A= ['loans','s-class','veyron','trump','rihana','drake','election']
Run Code Online (Sandbox Code Playgroud)
我也有另一只大熊猫数据框B与列category和words是逗号分隔字符串: -
category words
audi a4, a6
bugatti veyron, chiron
mercedez s-class, e-class
dslr canon, nikon
apple iphone,macbook,ipod
finance sales,loans,sales price
politics trump, election, votes
entertainment spiderman,thor, ironmen
music beiber, rihana,drake
........ ..............
......... .........
Run Code Online (Sandbox Code Playgroud)
我只想A用列映射列表元素words并将相应的分配category到新列表中。所以,预期的输出是。
matched_categories=['finance','mercedez','bugatti','politics','music','music','politics']
Run Code Online (Sandbox Code Playgroud)
通过过滤器boolean indexing与iat用于选择第一个匹配的值:
#if always matched all values
matched_categories = [df.loc[df['words'].str.contains(x), 'category'].iat[0] for x in A]
print (matched_categories)
['finance', 'mercedez', 'bugatti', 'politics', 'music', 'music', 'politics']
Run Code Online (Sandbox Code Playgroud)
如果某些值不匹配,则更通用的解决方案 - 然后返回not matched值:
#added last aaa value
A = ['loans','s-class','veyron','trump','rihana','drake','election','aaa']
matched_categories = [next(iter(df.loc[df['words'].str.contains(x),'category']),'not matched')
for x in A]
print (matched_categories)
['finance', 'mercedez', 'bugatti', 'politics', 'music', 'music', 'politics', 'not matched']
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
895 次 |
| 最近记录: |