我们如何找到对规则先验的支持和信心？

Question

我们如何找到对规则先验的支持和信心？

我正在交易数据中进行项目关联。我在R中使用arules软件包来构建规则。我正在通过此链接https://1drv.ms/u/s共享我的示例数据！Ak1rt2E1f2gFgV9t7hMVAn0P4gd0

library(arules)
library(arulesViz)
df = read.csv("trans.csv")
trans = as(split(df[,"Item"], df[,"Billno"]), "transactions")
inspect(trans[1:20])
summary(trans)
rules1 = apriori(trans,parameter = list(support = 0.6, confidence = 0.6, 
target = "rules"))
summary(rules1) ##Output is "Set of 0 rules"

Run Code Online (Sandbox Code Playgroud)

我正在输出，

Summary(rules1)

Run Code Online (Sandbox Code Playgroud)

0条规则集

在发布此链接之前，我引用了https://stats.stackexchange.com/questions/56034/association-analysis-returns-0-useful-rules此链接。我还尝试了随机数来获得支持和信心，但没有任何效果。

Answer 1

Mic*_*ler 5

找到正确的最小支持和最小置信度值并以0个频繁项集或0个关联规则结束的问题非常普遍。如果您需要复习一下，请阅读这篇文章，这是支持和信心的真正含义。

让我们先看看您的交易数据：

summary(trans)
transactions as itemMatrix in sparse format with
 2531 rows (elements/itemsets/transactions) and
 6632 columns (items) and a density of 0.0005951533 

most frequent items:
AR845311 AR800369 AR828249 AR839869 AR831167  (Other) 
      84       35       31       29       24     9787 

element (itemset/transaction) length distribution:
sizes
   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
 767 509 306 238 160 112 100  52  69  50  31  27  18  12  13  15   9  10   7   5   4 
 23  24  25  27  28  32  34  36  48 
  3   4   2   3   1   1   1   1   1 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   2.000   3.947   5.000  48.000

Run Code Online (Sandbox Code Playgroud)

要处理的第一个问题是最低限度的支持。摘要表明，您最常出现的项目（AR845311）在数据集中出现了84次。您的商品总体上支持率很低

summary(itemFrequency(trans))

      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
      0.0003951 0.0003951 0.0003951 0.0005952 0.0003951 0.0331900

Run Code Online (Sandbox Code Playgroud)

你用一分钟。支持度为0.6，但最常见的单个项目仅支持度为0.033！您需要减少支持。如果要查找数据中出现至少10次的项目集/规则，则可以将最低支持设置为：

 10/length(trans)

 [1] 0.003951008

Run Code Online (Sandbox Code Playgroud)

第二个问题是您的数据非常稀疏（摘要显示密度约为0.0006）。这意味着您的交易相当短（即，仅包含少量项目）。

table(size(trans))

  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
767 509 306 238 160 112 100  52  69  50  31  27  18  12  13  15   9  10   7   5   4 
 23  24  25  27  28  32  34  36  48 
  3   4   2   3   1   1   1   1   1

Run Code Online (Sandbox Code Playgroud)

空头交易意味着规则的信心可能会很低。对于您的数据，结果表明它非常低，因此我首先使用0。

rules <- apriori(trans, 
+   parameter = list(support = 0.004, confidence = 0, target = "rules"))
Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime support minlen maxlen
          0    0.1    1 none FALSE            TRUE       5   0.004      1     10
 target   ext
  rules FALSE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 10 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6632 item(s), 2531 transaction(s)] done [0.00s].
sorting and recoding items ... [40 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [46 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
> summary(rules)
set of 46 rules

rule length distribution (lhs + rhs):sizes
 1  2 
40  6 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    1.00    1.00    1.13    1.00    2.00 

summary of quality measures:
    support           confidence            lift            count      
 Min.   :0.004346   Min.   :0.004346   Min.   : 1.000   Min.   :11.00  
 1st Qu.:0.004741   1st Qu.:0.004840   1st Qu.: 1.000   1st Qu.:12.00  
 Median :0.005531   Median :0.005729   Median : 1.000   Median :14.00  
 Mean   :0.006803   Mean   :0.057301   Mean   : 3.316   Mean   :17.22  
 3rd Qu.:0.007112   3rd Qu.:0.008890   3rd Qu.: 1.000   3rd Qu.:18.00  
 Max.   :0.033188   Max.   :0.705882   Max.   :21.269   Max.   :84.00  

mining info:
  data ntransactions support confidence
 trans          2531   0.004          0

Run Code Online (Sandbox Code Playgroud)

结果表明，至少有一个规则的置信度为0.7。您可以以更高的信心再次运行APRIORI。以下是最高的置信度规则：

inspect(head(rules, by = "confidence"))
    lhs           rhs        support     confidence lift     count
[1] {AR835501} => {AR845311} 0.004741209 0.7058824  21.26891 12   
[2] {AR743988} => {AR845311} 0.004346108 0.6470588  19.49650 11   
[3] {AR800369} => {AR845311} 0.007111814 0.5142857  15.49592 18   
[4] {AR845311} => {AR800369} 0.007111814 0.2142857  15.49592 18   
[5] {AR845311} => {AR835501} 0.004741209 0.1428571  21.26891 12   
[6] {AR845311} => {AR743988} 0.004346108 0.1309524  19.49650 11

Run Code Online (Sandbox Code Playgroud)

有关如何使用关联规则挖掘的完整示例，请参见此处。

希望这可以帮助！

归档时间：	8 年，7 月前
查看次数：	2880 次
最近记录：	8 年，7 月前