Cod*_*dex 13 r frequency entropy
我已经尝试了几个小时来计算熵,我知道我错过了什么.希望有人在这里可以给我一个想法!
编辑:我认为我的公式错了!
码:
info <- function(CLASS.FREQ){
freq.class <- CLASS.FREQ
info <- 0
for(i in 1:length(freq.class)){
if(freq.class[[i]] != 0){ # zero check in class
entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]])) #I calculate the entropy for each class i here
}else{
entropy <- 0
}
info <- info + entropy # sum up entropy from all classes
}
return(info)
}
Run Code Online (Sandbox Code Playgroud)
我希望我的帖子很清楚,因为这是我第一次在这里发帖.
这是我的数据集:
buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
credit <- c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent")
student <- c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no")
income <- c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium")
age <- c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) # we change the age from categorical to numeric
Run Code Online (Sandbox Code Playgroud)
cde*_*man 20
最终,我发现代码中没有错误,因为它运行时没有错误.我认为你缺少的部分是课程频率的计算,你会得到你的答案.快速浏览您提供的不同对象,我怀疑您正在查看buys.
buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
freqs <- table(buys)/length(buys)
info(freqs)
[1] 0.940286
Run Code Online (Sandbox Code Playgroud)
作为改进代码的问题,如果提供类频率向量,则可以大大简化这一过程,因为您不需要循环.
例如:
# calculate shannon-entropy
-sum(freqs * log2(freqs))
[1] 0.940286
Run Code Online (Sandbox Code Playgroud)
作为旁注,该函数entropy.empirical位于entropy包中,您可以将单位设置为log2,从而提供更大的灵活性.例:
entropy.empirical(freqs, unit="log2")
[1] 0.940286
Run Code Online (Sandbox Code Playgroud)