use*_*352 5 python statistics distribution probability
在此链接中,给出了两个概率分布之间的总变异距离。
我尝试用python计算它。我有两个数据集,首先我根据直方图计算了它们的概率分布函数。然后我尝试获得两个分布之间的最大差异。但它返回给我的值非常小。看来我在这方面做错了。你能帮忙修复一下吗?
import scipy.stats as st
#original data has shape of [45222,1] and it is numpy array
#synthetic data has shape of [45222,1] and it is numpy array
summation = 0
minOriginal = min(original)
minGenerated = min(synthetic)
maxOriginal = max(original)
maxGenerated = max(synthetic)
minHist = min(minOriginal, minGenerated)
maxHist = max(maxOriginal, maxGenerated)
originalHist = np.histogram(original, range=(minHist, maxHist))
hist_dist1 = st.rv_histogram(originalHist)
generatedHist = np.histogram(synthetic, range=(minHist, maxHist))
hist_dist2 = st.rv_histogram(generatedHist)
x = np.linspace(minHist, maxHist, 45000)
summation += max(abs(hist_dist1.pdf(x)-hist_dist2.pdf(x)))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2952 次 |
| 最近记录: |