Col*_*ath 3 python r nested-lists networkx dataframe
我有一个数据框,它有一列包含嵌套列表。我正在努力从这些嵌套列表中提取用户名(我对此很陌生)。
虚拟数据:
myNestedList <- list("1" = list('username' = "test",
"uninteresting data" = "uninteresting content"),
"2" = list('username' = "test2",
"uninteresting data" = "uninteresting content"))
Column1 <- c("A","B","C")
column2 <- c("a","b","c")
mydf <- data.frame(Column1, column2)
mydf$nestedlist <- list(myNestedList)
Run Code Online (Sandbox Code Playgroud)
我想提取每一行的所有用户名并将它们附加到一个新列,如果一行有多个用户名,则第二个/第三个/第 n 个用户名应该只附加一个单独的“,”。我尝试过类似sapply(mydf$nestedlist, [[, 1)但这只是给了我整个列“nestedlist”的一个列表。
对于上下文:我正在尝试构建一个有向图,以便在 Networkx 或 Gephi 中进一步使用。column1 中的数据是节点,用户名是提及,因此是边。如果有另一种方法可以做到这一点,而不从嵌套列表中提取用户名,这也可能是一个解决方案。
在此先感谢您的帮助!:)
如果我们知道嵌套级别,可以使用 map_depth
library(purrr)
mydf$username <- map_depth(mydf$nestedlist, 2, pluck, "username")
Run Code Online (Sandbox Code Playgroud)
-输出
> mydf
Column1 column2 nestedlist username
1 A a test, uninteresting content, test2, uninteresting content test, test2
2 B b test, uninteresting content, test2, uninteresting content test, test2
3 C c test, uninteresting content, test2, uninteresting content test, test2
Run Code Online (Sandbox Code Playgroud)
或者,如果不知道,则使用递归函数进行condition检查以找到“用户名”
library(rrapply)
mydf$username <- rrapply(mydf$nestedlist,
condition = function(x, .xname) .xname %in% 'username', how = 'prune')
> mydf
Column1 column2 nestedlist username
1 A a test, uninteresting content, test2, uninteresting content test, test2
2 B b test, uninteresting content, test2, uninteresting content test, test2
3 C c test, uninteresting content, test2, uninteresting content test, test2
Run Code Online (Sandbox Code Playgroud)
如果我们想要paste它们,请使用
library(stringr)
library(dplyr)
mydf$username <- rrapply(mydf$nestedlist,
condition = function(x, .xname) .xname %in% 'username',
how = 'bind') %>%
invoke(str_c, sep=", ", .)
mydf
Column1 column2 nestedlist username
1 A a test, uninteresting content, test2, uninteresting content test, test2
2 B b test, uninteresting content, test2, uninteresting content test, test2
3 C c test, uninteresting content, test2, uninteresting content test, test2
Run Code Online (Sandbox Code Playgroud)
-结构体
> str(mydf)
'data.frame': 3 obs. of 4 variables:
$ Column1 : chr "A" "B" "C"
$ column2 : chr "a" "b" "c"
$ nestedlist:List of 3
..$ :List of 2
.. ..$ 1:List of 2
.. .. ..$ username : chr "test"
.. .. ..$ uninteresting data: chr "uninteresting content"
.. ..$ 2:List of 2
.. .. ..$ username : chr "test2"
.. .. ..$ uninteresting data: chr "uninteresting content"
..$ :List of 2
.. ..$ 1:List of 2
.. .. ..$ username : chr "test"
.. .. ..$ uninteresting data: chr "uninteresting content"
.. ..$ 2:List of 2
.. .. ..$ username : chr "test2"
.. .. ..$ uninteresting data: chr "uninteresting content"
..$ :List of 2
.. ..$ 1:List of 2
.. .. ..$ username : chr "test"
.. .. ..$ uninteresting data: chr "uninteresting content"
.. ..$ 2:List of 2
.. .. ..$ username : chr "test2"
.. .. ..$ uninteresting data: chr "uninteresting content"
$ username : chr "test, test2" "test, test2" "test, test2"
Run Code Online (Sandbox Code Playgroud)