如何计算数据帧pandas-python中值的条件概率？

Question

如何计算数据帧pandas-python中值的条件概率？

CSM*_*ick 7 python probability dataframe pandas

我想在评级栏中计算评级的条件概率('A','B','C').

    company     model    rating   type
0   ford       mustang     A      coupe
1   chevy      camaro      B      coupe
2   ford       fiesta      C      sedan
3   ford       focus       A      sedan
4   ford       taurus      B      sedan
5   toyota     camry       B      sedan

Run Code Online (Sandbox Code Playgroud)

输出:

Prob(rating=A) = 0.333333 
Prob(rating=B) = 0.500000 
Prob(rating=C) = 0.166667 

Prob(type=coupe|rating=A) = 0.500000 
Prob(type=sedan|rating=A) = 0.500000 
Prob(type=coupe|rating=B) = 0.333333 
Prob(type=sedan|rating=B) = 0.666667 
Prob(type=coupe|rating=C) = 0.000000 
Prob(type=sedan|rating=C) = 1.000000

Run Code Online (Sandbox Code Playgroud)

任何帮助,谢谢.. !!

Answer 1

Ste*_*fan 13

你可以使用.groupby()和内置.div():

rating_probs = df.groupby('rating').size().div(len(df))

rating
A    0.333333
B    0.500000
C    0.166667

Run Code Online (Sandbox Code Playgroud)

和条件probs:

df.groupby(['type', 'rating']).size().div(len(df)).div(rating_probs, axis=0, level='rating')

coupe  A         0.500000
       B         0.333333
sedan  A         0.500000
       B         0.666667
       C         1.000000

Run Code Online (Sandbox Code Playgroud)

Answer 2

Gus*_*rra 4

您可以使用groupby：

In [2]: df = pd.DataFrame({'company': ['ford', 'chevy', 'ford', 'ford', 'ford', 'toyota'],
                     'model': ['mustang', 'camaro', 'fiesta', 'focus', 'taurus', 'camry'],
                     'rating': ['A', 'B', 'C', 'A', 'B', 'B'],
                     'type': ['coupe', 'coupe', 'sedan', 'sedan', 'sedan', 'sedan']})

In [3]: df.groupby('rating').count()['model'] / len(df)
Out[3]:
rating
A    0.333333
B    0.500000
C    0.166667
Name: model, dtype: float64

In [4]: (df.groupby(['rating', 'type']).count() / df.groupby('rating').count())['model']
Out[4]:
rating  type
A       coupe    0.500000
        sedan    0.500000
B       coupe    0.333333
        sedan    0.666667
C       sedan    1.000000
Name: model, dtype: float64

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，8 月前
查看次数：	12970 次
最近记录：	6 年，10 月前