给定0-1序列中大小为3的所有子序列的频率？

Question

给定0-1序列中大小为3的所有子序列的频率？

给定数据

s<-c(1,0,0,0,1,0,0,0,0,0,1,1,1,0,0)

Run Code Online (Sandbox Code Playgroud)

我可以用table或ftable计算1和0

ftable(s,row.vars =1:1)

Run Code Online (Sandbox Code Playgroud)

并且总共有11s,01s,10s,00s发生在s中

table(s[-length(s)],s[-1]).

Run Code Online (Sandbox Code Playgroud)

什么是聪明的方法来计算111s,011s,...,100s,000s的出现次数？理想情况下,我想要一个x表的计数表

   0 1
11 x x
01 x x
10 x x
00 x x

Run Code Online (Sandbox Code Playgroud)

是否有一般方法计算长度为k = 1,2,3,4,......的所有可能子序列的总出现次数？

Answer 1

Sha*_*pie 5

好吧,看起来你首先需要从你的向量生成n元组.以下功能应该实现:

makeTuples <- function( x, n ){

  # Very inefficient way to loop... but what the heck
  tuples <- list()

  for( i in 1:n ){

    tuples[[i]] <- x[i:(length(x)-n+i)]

  }

  return(tuples)

}

Run Code Online (Sandbox Code Playgroud)

然后你可以将结果提供makeTuples()给table()使用do.call():

do.call( table, makeTuples(s,3) )

, ,  = 0


    0 1
  0 4 1
  1 3 1

, ,  = 1


    0 1
  0 2 1
  1 0 1

Run Code Online (Sandbox Code Playgroud)

这是有效的,因为该makeTuples()函数将元组作为列表列表返回.输出不是你想要的那么好,但是你可以写一个函数来重新格式化,比如说:

Run Code Online (Sandbox Code Playgroud)

至:

     0 1
  00 4 1
  01 3 1

Run Code Online (Sandbox Code Playgroud)

它需要在返回的n维数组的外部n-2维上循环table,创建行名并将事物连接在一起.

更新

所以,我只是坐在一个随机过程类中,当我想出一个或多或少直接的方式来产生你想要的输出而不试图解开输出table().首先,您需要一个能够从您的人口中生成n个选择的所有可能排列的函数.排列的产生可以用expand.grid(),但它需要一点糖涂层:

permute <- function( population, n ){

  permutations <- do.call( expand.grid, rep( list(population), n ) )

  permutations <- apply( permutations, 1, paste, collapse = '' )

  return( permutations )

}

Run Code Online (Sandbox Code Playgroud)

基本思想是迭代排列列表并计算与给定排列匹配的元组数.由于您希望将结果拆分为表格,因此我们应该从总体中选择n-1个元素的排列,并让最后一个位置构成表格的列.这是一个函数,它采用大小为n-1的排列,一个元组列表,以及从中抽取元组的总体,并生成一个匹配计数的命名向量:

countFrequency <- function(permutation,tuples,population){

  permutations <- paste( permutation, population, sep = '' )

  # Inner lapply applies the equality operator `==` to each
  # permutation and returns a list of TRUE/FALSE vectors.
  # Outer lapply sums the number of TRUE values in each vector. 
  frequencies <- lapply(lapply(permutations,`==`,tuples),sum)

  names( frequencies ) <- as.character( population )

  return( unlist(frequencies) )

}

Run Code Online (Sandbox Code Playgroud)

最后,所有三个函数都可以组合成一个更大的函数,它接受一个向量,将它分成n元组并返回一个频率表.最后的聚合操作是使用ldply()Hadley Wickham的plyr软件包完成的,因为它可以很好地保存信息,例如哪个排列对应于哪一行输出匹配:

permutationFrequency <- function( vector, n, population = unique( vector ) ){

  # Split the vector into tuples.
  tuples <- makeTuples( vector, n )

  # Coerce and compact the tuples to a vector of strings.
  tuples <- do.call(cbind,tuples)
  tuples <- apply( tuples, 1, paste, collapse = '' )

  # Generate permutations of n-1 elements from the population.
  # Turn into a named list for ldply() to work it's magic.
  permutations <- permute( population, n-1 )
  names( permutations ) <- permutations

  frequencies <- ldply( permutations, countFrequency,
    tuples = tuples, population = population )

  return( frequencies )

}

Run Code Online (Sandbox Code Playgroud)

你去了:

require( plyr )
permutationFrequency( s, 2 )
  .id 1 0
1   1 2 3
2   0 2 7

permutationFrequency( s, 3 )
  .id 1 0
1  11 1 1
2  01 1 1
3  10 0 3
4  00 2 4

permutationFrequency( s, 4 )
  .id 1 0
1 111 0 1
2 011 1 0
3 101 0 0
4 001 1 1
5 110 0 1
6 010 0 1
7 100 0 2
8 000 2 2

permutationFrequency( sample( -1:1, 10, replace = T ), 2 )
  .id 1 -1 0
1   1 1  2 0
2  -1 0  1 2
3   0 1  0 2

Run Code Online (Sandbox Code Playgroud)

向我的随机过程教师道歉,但R中的函数式编程问题比今天的Gambler's Ruin更有趣......

归档时间：	16 年，2 月前
查看次数：	633 次
最近记录：	7 年，5 月前