如何从嵌套列表中提取特定项目并附加到新列?

Col*_*ath 3 python r nested-lists networkx dataframe

我有一个数据框,它有一列包含嵌套列表。我正在努力从这些嵌套列表中提取用户名(我对此很陌生)。

虚拟数据:

myNestedList <- list("1" = list('username' = "test",
                              "uninteresting data" = "uninteresting content"),
                     "2" = list('username' = "test2",
                                "uninteresting data" = "uninteresting content"))
Column1 <- c("A","B","C")
column2 <- c("a","b","c")
mydf <- data.frame(Column1, column2)
mydf$nestedlist <- list(myNestedList)
Run Code Online (Sandbox Code Playgroud)

我想提取每一行的所有用户名并将它们附加到一个新列,如果一行有多个用户名,则第二个/第三个/第 n 个用户名应该只附加一个单独的“,”。我尝试过类似sapply(mydf$nestedlist, [[, 1)但这只是给了我整个列“nestedlist”的一个列表。

对于上下文:我正在尝试构建一个有向图,以便在 Networkx 或 Gephi 中进一步使用。column1 中的数据是节点,用户名是提及,因此是边。如果有另一种方法可以做到这一点,而不从嵌套列表中提取用户名,这也可能是一个解决方案。

在此先感谢您的帮助!:)

akr*_*run 7

如果我们知道嵌套级别,可以使用 map_depth

library(purrr)
 mydf$username <- map_depth(mydf$nestedlist, 2, pluck, "username")
Run Code Online (Sandbox Code Playgroud)

-输出

> mydf
  Column1 column2                                                nestedlist    username
1       A       a test, uninteresting content, test2, uninteresting content test, test2
2       B       b test, uninteresting content, test2, uninteresting content test, test2
3       C       c test, uninteresting content, test2, uninteresting content test, test2
Run Code Online (Sandbox Code Playgroud)

或者,如果不知道,则使用递归函数进行condition检查以找到“用户名”

library(rrapply)
mydf$username <- rrapply(mydf$nestedlist,  
    condition = function(x, .xname) .xname %in% 'username', how = 'prune')
> mydf
  Column1 column2                                                nestedlist    username
1       A       a test, uninteresting content, test2, uninteresting content test, test2
2       B       b test, uninteresting content, test2, uninteresting content test, test2
3       C       c test, uninteresting content, test2, uninteresting content test, test2
Run Code Online (Sandbox Code Playgroud)

如果我们想要paste它们,请使用

library(stringr)
library(dplyr)
mydf$username <- rrapply(mydf$nestedlist,  
    condition = function(x, .xname) .xname %in% 'username',
          how = 'bind') %>% 
        invoke(str_c, sep=", ", .)
 mydf
  Column1 column2                                                nestedlist    username
1       A       a test, uninteresting content, test2, uninteresting content test, test2
2       B       b test, uninteresting content, test2, uninteresting content test, test2
3       C       c test, uninteresting content, test2, uninteresting content test, test2
Run Code Online (Sandbox Code Playgroud)

-结构体

> str(mydf)
'data.frame':   3 obs. of  4 variables:
 $ Column1   : chr  "A" "B" "C"
 $ column2   : chr  "a" "b" "c"
 $ nestedlist:List of 3
  ..$ :List of 2
  .. ..$ 1:List of 2
  .. .. ..$ username          : chr "test"
  .. .. ..$ uninteresting data: chr "uninteresting content"
  .. ..$ 2:List of 2
  .. .. ..$ username          : chr "test2"
  .. .. ..$ uninteresting data: chr "uninteresting content"
  ..$ :List of 2
  .. ..$ 1:List of 2
  .. .. ..$ username          : chr "test"
  .. .. ..$ uninteresting data: chr "uninteresting content"
  .. ..$ 2:List of 2
  .. .. ..$ username          : chr "test2"
  .. .. ..$ uninteresting data: chr "uninteresting content"
  ..$ :List of 2
  .. ..$ 1:List of 2
  .. .. ..$ username          : chr "test"
  .. .. ..$ uninteresting data: chr "uninteresting content"
  .. ..$ 2:List of 2
  .. .. ..$ username          : chr "test2"
  .. .. ..$ uninteresting data: chr "uninteresting content"
 $ username  : chr  "test, test2" "test, test2" "test, test2"
Run Code Online (Sandbox Code Playgroud)