Lui*_*uez 0 python count dataframe pandas
我有一个数据框,其中包含产品列表及其各自的评论
+ --------- + --------------------------------------- --------- +
| 产品| 审查|
+ --------- + --------------------------------------- --------- +
| product_a | 这适合休闲午餐
+ --------- + --------------------------------------- --------- +
| product_b | 艾弗里是最知识渊博的咖啡师之一
+ --------- + --------------------------------------- --------- +
| product_c | 导游告诉我们秘密|
+ --------- + --------------------------------------- --------- +
如何获取数据框中的所有唯一单词?
我做了一个功能:
def count_words(text):
try:
text = text.lower()
words = text.split()
count_words = Counter(words)
except Exception, AttributeError:
count_words = {'':0}
return count_words
Run Code Online (Sandbox Code Playgroud)
并将该函数应用于DataFrame,但这只给了我每行的单词计数.
reviews['words_count'] = reviews['review'].apply(count_words)
Run Code Online (Sandbox Code Playgroud)
从这开始:
dfx
review
0 United Kingdom
1 The United Kingdom
2 Dublin, Ireland
3 Mardan, Pakistan
Run Code Online (Sandbox Code Playgroud)
要获取"审核"列中的所有字词:
list(dfx['review'].str.split(' ', expand=True).stack().unique())
['United', 'Kingdom', 'The', 'Dublin,', 'Ireland', 'Mardan,', 'Pakistan']
Run Code Online (Sandbox Code Playgroud)
要获得"审核"列的计数:
dfx['review'].str.split(' ', expand=True).stack().value_counts()
United 2
Kingdom 2
Mardan, 1
The 1
Ireland 1
Dublin, 1
Pakistan 1
dtype: int64 ?
Run Code Online (Sandbox Code Playgroud)