Pet*_*ung 2 r frequency pattern-matching frequency-distribution frequency-analysis
我有很多行的整数,每行有7列,它是从实验记录的一些生物点.数字仅为1到7,我想确定出现的那些整数的常见模式.
first few rows of df:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 3 4 6 7 7
[2,] 1 2 2 3 3 5 7
[3,] 1 2 2 3 3 4 5
[4,] 2 3 4 7 7 7 7
[5,] 1 1 3 4 5 6 7
[6,] 2 2 3 3 4 6 6
[7,] 1 1 2 3 3 6 6
[8,] 2 2 3 4 6 6 7
...
Run Code Online (Sandbox Code Playgroud)
为实例,
desired output:
pattern freq
1 2 3 4 1
2 3 4 6 2
1 2 3 4
2 2 3 4
...
...
Run Code Online (Sandbox Code Playgroud)
请指教,谢谢.
dt = read.table(header = TRUE,
text ="X1 X2 X3 X4 X5 X6 X7
1 2 3 4 6 7 7
1 2 2 3 3 5 7
1 2 2 3 3 4 5
2 3 4 7 7 7 7
1 1 3 4 5 6 7
", stringsAsFactors= F)
# create a new column `x` with the columns collapsed together
dt$x <- apply( dt[ , names(dt) ] , 1 , paste , collapse = " ")
library(quanteda)
d = dfm_tfidf(dfm(dt$x,ngrams = 2:7, skip = 0:7), scheme_tf = "boolean", scheme_df="unary")
topfeatures(d, 25)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
65 次 |
最近记录: |