piR*_*red 18 python numpy pandas
您可以使用这些方法计算偏斜和峰度
但是,没有方便的方法来计算变量之间的歪斜或彗星密度.甚至更好,歪斜或cokurtosis矩阵.
考虑一下 pd.DataFrame df
import pandas as pd
import numpy as np
np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(10, 2), columns=list('ab'))
df
a b
0 0.444939 0.407554
1 0.460148 0.465239
2 0.462691 0.016545
3 0.850445 0.817744
4 0.777962 0.757983
5 0.934829 0.831104
6 0.879891 0.926879
7 0.721535 0.117642
8 0.145906 0.199844
9 0.437564 0.100702
Run Code Online (Sandbox Code Playgroud)
a和b?piR*_*red 17
参考
coskew我对歪斜的解释是一个系列与另一个系列的方差之间的"相关性".因此,根据我们计算方差的系列,你实际上可以有两种类型的化身.维基百科显示了这两个公式
幸运的是,当我们计算歪斜矩阵时,一个是另一个的转置.
def coskew(df, bias=False):
v = df.values
s1 = sigma = v.std(0, keepdims=True)
means = v.mean(0, keepdims=True)
# means is 1 x n (n is number of columns
# this difference broacasts appropriately
v1 = v - means
s2 = sigma ** 2
v2 = v1 ** 2
m = v.shape[0]
skew = pd.DataFrame(v2.T.dot(v1) / s2.T.dot(s1) / m, df.columns, df.columns)
if not bias:
skew *= ((m - 1) * m) ** .5 / (m - 2)
return skew
Run Code Online (Sandbox Code Playgroud)
coskew(df)
a b
a -0.369380 0.096974
b 0.325311 0.067020
Run Code Online (Sandbox Code Playgroud)
我们可以将其与之比较df.skew()并检查对角线是否相同
df.skew()
a -0.36938
b 0.06702
dtype: float64
Run Code Online (Sandbox Code Playgroud)
cokurtosis我对cokurtosis的解释是其中之一
对于选项1,我们再次具有左和右变体,其以矩阵形式是彼此的转置.因此,我们只关注左侧变体.这让我们计算了总共两种变化.
def cokurt(df, bias=False, fisher=True, variant='middle'):
v = df.values
s1 = sigma = v.std(0, keepdims=True)
means = v.mean(0, keepdims=True)
# means is 1 x n (n is number of columns
# this difference broacasts appropriately
v1 = v - means
s2 = sigma ** 2
s3 = sigma ** 3
v2 = v1 ** 2
v3 = v1 ** 3
m = v.shape[0]
if variant in ['left', 'right']:
kurt = pd.DataFrame(v3.T.dot(v1) / s3.T.dot(s1) / m, df.columns, df.columns)
if variant == 'right':
kurt = kurt.T
elif variant == 'middle':
kurt = pd.DataFrame(v2.T.dot(v2) / s2.T.dot(s2) / m, df.columns, df.columns)
if not bias:
kurt = kurt * (m ** 2 - 1) / (m - 2) / (m - 3) - 3 * (m - 1) ** 2 / (m - 2) / (m - 3)
if not fisher:
kurt += 3
return kurt
Run Code Online (Sandbox Code Playgroud)
cokurt(df, variant='middle', bias=False, fisher=False)
a b
a 1.882817 0.86649
b 0.866490 1.63200
cokurt(df, variant='left', bias=False, fisher=False)
a b
a 1.882817 0.19175
b -0.020567 1.63200
Run Code Online (Sandbox Code Playgroud)
对角线应该等于 kurtosis
df.kurtosis() + 3
a 1.882817
b 1.632000
dtype: float64
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1603 次 |
| 最近记录: |