从多个整数中识别公共模式的频率

Pet*_*ung 2 r frequency pattern-matching frequency-distribution frequency-analysis

我有很多行的整数,每行有7列,它是从实验记录的一些生物点.数字仅为1到7,我想确定出现的那些整数的常见模式.

first few rows of df:

        [,1] [,2] [,3] [,4] [,5] [,6] [,7]
   [1,]    1    2    3    4    6    7    7
   [2,]    1    2    2    3    3    5    7
   [3,]    1    2    2    3    3    4    5
   [4,]    2    3    4    7    7    7    7
   [5,]    1    1    3    4    5    6    7
   [6,]    2    2    3    3    4    6    6
   [7,]    1    1    2    3    3    6    6
   [8,]    2    2    3    4    6    6    7
   ...
Run Code Online (Sandbox Code Playgroud)

为实例,

desired output:

pattern freq
1 2 3 4 1
2 3 4 6 2
1 2 3   4
2 2 3   4
...
...
Run Code Online (Sandbox Code Playgroud)

请指教,谢谢.

joh*_*ohn 6

dt = read.table(header = TRUE, 
text ="X1 X2 X3 X4 X5 X6 X7
1    2    3    4    6    7    7
1    2    2    3    3    5    7
1    2    2    3    3    4    5
2    3    4    7    7    7    7
1    1    3    4    5    6    7

", stringsAsFactors= F)


# create a new column `x` with the columns collapsed together
dt$x <- apply( dt[ , names(dt) ] , 1 , paste , collapse = " ")

library(quanteda)
d = dfm_tfidf(dfm(dt$x,ngrams = 2:7, skip = 0:7), scheme_tf = "boolean", scheme_df="unary")
topfeatures(d, 25)
Run Code Online (Sandbox Code Playgroud)