熊猫:获得系列的前10个元素

chi*_*n s 6 python indexing list python-2.7 pandas

我有一个带有列的数据框,tfidf_sorted如下所示:

   tfidf_sorted

0  [(morrell, 45.9736796), (football, 25.58352014...
1  [(melatonin, 48.0010051405), (lewy, 27.5842077...
2  [(blues, 36.5746634797), (harpdog, 20.58669641...
3  [(lem, 35.1570832476), (rottensteiner, 30.8800...
4  [(genka, 51.4667410433), (legendaarne, 30.8800...
Run Code Online (Sandbox Code Playgroud)

type(df.tfidf_sorted)回报pandas.core.series.Series.

此列创建如下:

df['tfidf_sorted'] = df['tfidf'].apply(lambda y: sorted(y.items(), key=lambda x: x[1], reverse=True))
Run Code Online (Sandbox Code Playgroud)

哪里tfidf是字典.

如何获得前10个键值对tfidf_sorted

jez*_*ael 4

IIUC 您可以使用:

from itertools import chain 

#flat nested lists
a = list(chain.from_iterable(df['tfidf_sorted']))
#sorting
a.sort(key=lambda x: x[1], reverse=True)
#get 10 top
print (a[:10])
Run Code Online (Sandbox Code Playgroud)

或者,如果需要每行前 10 个,则添加[:10]

df['tfidf_sorted'] = df['tfidf'].apply(lambda y: (sorted(y.items(), key=lambda x: x[1], reverse=True))[:10])
Run Code Online (Sandbox Code Playgroud)