rep*_*mer 5 r associations apriori
问题:
arules包的先验函数从输入事务中推断出关联规则,并报告每个规则的支持,置信度和提升.关联规则源自频繁项集.我想在输入事务中获得最频繁的项目集.具体来说,我想获得具有给定最小支持的所有项目集.itemset的支持是包含itemset的事务数与事务总数的比率.
要求:
itemFrequency提供的功能.不幸的是,此功能仅使用单个项目报告项目集.我对所有长度的项目集感兴趣,并且支持率最低.示例输入:
a,b
a,b,c
Run Code Online (Sandbox Code Playgroud)
程序:
# The following is how I'm using apriori to infer the association rules.
library(package = "arules")
transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",")
rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001))
WRITE(rules, file = "", sep = ",", quote = TRUE, col.names = NA)
Run Code Online (Sandbox Code Playgroud)
电流输出:
"","rules","support","confidence","lift"
"1","{} => {c}",0.5,0.5,1
"2","{} => {b}",1,1,1
"3","{} => {a}",1,1,1
"4","{c} => {b}",0.5,1,1
"5","{b} => {c}",0.5,0.5,1
"6","{c} => {a}",0.5,1,1
"7","{a} => {c}",0.5,0.5,1
"8","{b} => {a}",1,1,1
"9","{a} => {b}",1,1,1
"10","{b,c} => {a}",0.5,1,1
"11","{a,c} => {b}",0.5,1,1
"12","{a,b} => {c}",0.5,0.5,1
Run Code Online (Sandbox Code Playgroud)
期望的输出:
"itemset","support"
"{a}",1
"{a,b}",1
"{b}",1
"{a,b,c}",0.5
"{a,c}",0.5
"{b,c}",0.5
"{c}",0.5
Run Code Online (Sandbox Code Playgroud)
我在arules包generatingItemsets的参考手册中找到了这个函数.
library(package = "arules")
transactions = read.transactions(file = file("stdin"), format = "basket", sep = ",")
rules = apriori(transactions, parameter = list(minlen=1, sup = 0.001, conf = 0.001))
itemsets <- unique(generatingItemsets(rules))
itemsets.df <- as(itemsets, "data.frame")
frequentItemsets <- itemsets.df[with(itemsets.df, order(-support,items)),]
names(frequentItemsets)[1] <- "itemset"
write.table(frequentItemsets, file = "", sep = ",", row.names = FALSE)
Run Code Online (Sandbox Code Playgroud)