Bab*_*nya 6 r arules market-basket-analysis
我想以篮子格式创建一个事务对象,我可以随时调用它进行分析.数据包含逗号分隔的项目,包含1001个事务.前10个交易看起来像这样:
hering,corned_b,olives,ham,turkey,bourbon,ice_crea
baguette,soda,hering,cracker,heineken,olives,corned_b
avocado,cracker,artichok,heineken,ham,turkey,sardines
olives,bourbon,coke,turkey,ice_crea,ham,peppers
hering,corned_b,apples,olives,steak,avocado,turkey
sardines,heineken,chicken,coke,ice_crea,peppers,ham
olives,bourbon,coke,turkey,ice_crea,heineken,apples
corned_b,peppers,bourbon,cracker,chicken,ice_crea,baguette
soda,olives,bourbon,cracker,heineken,peppers,baguette
corned_b,peppers,bourbon,cracker,chicken,bordeaux,hering
...
Run Code Online (Sandbox Code Playgroud)
我观察到数据中存在重复的事务并将其删除但每次我尝试读取事务时,我得到:
asMethod(object)中的错误:无法使用重复项的事务强制列表
这是我的代码:
data <- read.csv("AssociationsItemList.txt",header=F)
data <- data[!duplicated(data),]
pop <- NULL
for(i in 1:length(data)){
pop <- paste(pop, data[i],sep="\n")
}
write(pop, file = "Trans", sep = ",")
transdata <- read.transactions("Trans", format = "basket", sep=",")
Run Code Online (Sandbox Code Playgroud)
我确信我错过了一些不重要的东西.请提供帮助.
Vin*_*ynd 16
问题不在于重复的交易(同一行出现两次),而是重复的项目(同一项目出现两次,在同一交易中 - 例如,第4行的"橄榄").
read.transactions有一个rm.duplicates参数来删除那些重复.
read.transactions("Trans", format = "basket", sep=",", rm.duplicates=TRUE)
Run Code Online (Sandbox Code Playgroud)