从数据帧到顶点/边缘数组

dmv*_*nna 6 r igraph vertexdata dataframe

我有数据帧

test <- structure(list(
     y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"),
     y2003 = c("freshman","junior","junior","sophomore","sophomore","senior"),
     y2004 = c("junior","sophomore","sophomore","senior","senior",NA),
     y2005 = c("senior","senior","senior",NA, NA, NA)), 
              .Names = c("2002","2003","2004","2005"),
              row.names = c(c(1:6)),
              class = "data.frame")
> test
       2002      2003      2004   2005
1  freshman  freshman    junior senior
2  freshman    junior sophomore senior
3  freshman    junior sophomore senior
4 sophomore sophomore    senior   <NA>
5 sophomore sophomore    senior   <NA>
6    senior    senior      <NA>   <NA>
Run Code Online (Sandbox Code Playgroud)

我需要创建一个顶点/边缘列表(用于igraph),每次学生类别连续几年变化,而忽略没有变化,如

testvertices <- structure(list(
 vertex = 
  c("freshman","junior", "freshman","junior","sophomore","freshman",
    "junior","sophomore","sophomore","sophomore"),
 edge = 
  c("junior","senior","junior","sophomore","senior","junior",
    "sophomore","senior","senior","senior"),
 id =
  c("1","1","2","2","2","3","3","3","4","5")),
                       .Names = c("vertex","edge", "id"),
                       row.names = c(1:10),
                       class = "data.frame")
> testvertices
      vertex      edge id
1   freshman    junior  1
2     junior    senior  1
3   freshman    junior  2
4     junior sophomore  2
5  sophomore    senior  2
6   freshman    junior  3
7     junior sophomore  3
8  sophomore    senior  3
9  sophomore    senior  4
10 sophomore    senior  5
Run Code Online (Sandbox Code Playgroud)

此时我忽略了ID,我的图形应该按重量计算边缘(即,新生 - >初级= 3).想法是制作一个树形图.我知道这是在主要的调整点旁边,但是如果你问的话......

Gab*_*rdi 3

如果我理解正确的话,你需要这样的东西:

elist <- lapply(seq_len(nrow(test)), function(i) {
  x <- as.character(test[i,])
  x <- unique(na.omit(x))
  x <- rep(x, each=2)
  x <- x[-1]
  x <- x[-length(x)]
  r <- matrix(x, ncol=2, byrow=TRUE)
  if (nrow(r) > 0) { r <- cbind(r, i) } else { r <- cbind(r, numeric()) }
  r
})

do.call(rbind, elist)

#                              i  
# [1,] "freshman"  "junior"    "1"
# [2,] "junior"    "senior"    "1"
# [3,] "freshman"  "junior"    "2"
# [4,] "junior"    "sophomore" "2"
# [5,] "sophomore" "senior"    "2"
# [6,] "freshman"  "junior"    "3"
# [7,] "junior"    "sophomore" "3"
# [8,] "sophomore" "senior"    "3"
# [9,] "sophomore" "senior"    "4"
#[10,] "sophomore" "senior"    "5"
Run Code Online (Sandbox Code Playgroud)

这不是最有效的解决方案,但我认为它相当具有说教意义。我们为输入矩阵的每一行分别创建边,因此lapply. 为了从一行创建边,我们首先删除 NA 和重复项,然后将每个顶点包含两次。最后,我们删除第一个和最后一个顶点。这样我们就创建了一个边列表矩阵,我们只需要删除第一个和最后一个顶点并将其格式化为两列(实际上,将其保留为向量会更有效,没关系)。

添加额外列时,我们必须小心检查边缘列表矩阵是否有零行。

do.call函数会将所有内容粘合在一起。结果是一个矩阵,如果您愿意,您可以通过 将其转换为数据框as.data.frame(),然后您还可以将第三列转换为数字。如果您愿意,还可以更改列名称。