计算pandas列中的值的频率,其中另一列中的值相似

Question

计算pandas列中的值的频率,其中另一列中的值相似

cal*_*Guy 1 python frequency count pandas

给定一个pandas dataframe看起来如下column_a及column_b.如何构造2个附加列,一个计算column_a所有列的每个值的频率,另一个计算值的唯一值,其中值column_a相同:

column_a | column_b | col_a_count | count_unique_b_where_a
  0           1           4         3
  0           1           4         3
  0           2           4         3
  0           3           4         3
  2           0           3         1
  2           0           3         1
  2           0           3         1 
  5           3           1         1
  9           5           6         5 
  9           5           6         5
  9           3           6         5
  9           4           6         5
  9           2           6         5
  9           1           6         5

Run Code Online (Sandbox Code Playgroud)

Answer 1

use*_*203 5

使用groupby和agg:

s = (df.groupby('column_a').agg(
        {'column_a': 'count', 'column_b': 'nunique'}).reindex(df.column_a))

Run Code Online (Sandbox Code Playgroud)

          column_a  column_b   
column_a                       
0                4         3   
0                4         3   
0                4         3   
0                4         3   
2                3         1   
2                3         1   
2                3         1   
5                1         1   
9                6         5   
9                6         5   
9                6         5   
9                6         5   
9                6         5   
9                6         5

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，4 月前
查看次数：	58 次
最近记录：	7 年，4 月前