这是一个简单的分类法(标签和ID):
test_data <- data.frame(
cat_id = c(661, 197, 228, 650, 126, 912, 949, 428),
cat_h1 = c(rep("Animals", 5), rep("Plants", 3)),
cat_h2 = c(rep("Mammals", 3), rep("Birds", 2), c("Wheat", "Grass", "Other")),
cat_h3 = c("Dogs", "Dogs", "Other", "Hawks", "Other", rep(NA, 3)),
cat_h4 = c("Big", "Little", rep(NA, 6)))
Run Code Online (Sandbox Code Playgroud)
解析后的结构应与以下内容匹配:
list(
Animals = list(Mammals = list(Dogs = list(Big = 661, Little = 197), Other = 228),
Birds = list(Hawks = 650, Other = 126)),
Plants = list(Wheat = 912, Grass = 949, Other = 428))
Run Code Online (Sandbox Code Playgroud)
如果您对订单稍有变化没有问题,那么这是一个按列处理的递归解决方案:
f <- function(x, d=cbind(x,NA)) {
c(
# call f by branch
if(ncol(d) > 3) local({
x <- d[!is.na(d[[3]]),]
by( x[-2], droplevels(x[2]), f, x=NA, simplify=FALSE)
}),
# leaf nodes
setNames(as.list(d[[1]]), d[[2]])[is.na(d[[3]])]
)
}
Run Code Online (Sandbox Code Playgroud)
这会给出这样的:
> str(f(test_data))
List of 2
$ Animals:List of 2
..$ Birds :List of 2
.. ..$ Hawks: num 650
.. ..$ Other: num 126
..$ Mammals:List of 2
.. ..$ Dogs :List of 2
.. .. ..$ Big : num 661
.. .. ..$ Little: num 197
.. ..$ Other: num 228
$ Plants :List of 3
..$ Wheat: num 912
..$ Grass: num 949
..$ Other: num 428
Run Code Online (Sandbox Code Playgroud)