假设我有一个大的data.table,看起来像dt下面。
dt <- data.table(
player_1 = c("a", "b", "b", "c"),
player_1_age = c(10, 20, 20, 30),
player_2 = c("b", "a", "c", "a"),
player_2_age = c(20, 10, 30, 10)
)
# dt
# player_1 player_1_age player_2 player_2_age
# 1: a 10 b 20
# 2: b 20 a 10
# 3: b 20 c 30
# 4: c 30 a 10
Run Code Online (Sandbox Code Playgroud)
根据dt以上内容,我想创建一个data.table具有独特玩家及其年龄的人,如下所示player_dt:
# player_dt
# player age
# a 10
# b 20
# c 30
Run Code Online (Sandbox Code Playgroud)
为此,我尝试了下面的代码,但在我较大的数据集上花费的时间太长,可能是因为我正在data.table为sapply.
在检查每个值是否只有一个唯一值时player_dt,您将如何获得上述信息?playerage
# get unique players
player <- sort(unique(c(dt$player_1, dt$player_2)))
# for each player, get their age, if there is only one age value
age <- sapply(player, function(x) {
unique_values <- unique(c(
dt[player_1 == x][["player_1_age"]],
dt[player_2 == x][["player_2_age"]]))
if(length(unique_values) > 1) stop() else return(unique_values)
})
# combine to create the player_dt
player_dt <- data.table(player, age)
Run Code Online (Sandbox Code Playgroud)
我使用来自@DavidT 的数据作为输入。
dt
# player_1 player_1_age player_2 player_2_age
#1: a 10 b 20
#2: b 20 a 10
#3: b 20 c 30
#4: c 30 a 11 # <--
Run Code Online (Sandbox Code Playgroud)
TL; 博士
你可以做
nm <- names(dt)
idx <- endsWith(nm, "age")
colsAge <- nm[idx]
colsOther <- nm[!idx]
out <-
unique(melt(
dt,
measure.vars = list(colsAge, colsOther),
value.name = c("age", "player")
)[, .(age, player)])[, if (.N == 1) # credit: /sf/answers/2409956111/
.SD, by = player]
out
# player age
#1: b 20
#2: c 30
Run Code Online (Sandbox Code Playgroud)
一步步
您可以做的是同时熔化多个列 - 以结尾的列"age"和不以结尾的列。
nm <- names(dt)
idx <- endsWith(nm, "age")
colsAge <- nm[idx]
colsOther <- nm[!idx]
dt1 <- melt(dt, measure.vars = list(colsAge, colsOther), value.name = c("age", "player"))
Run Code Online (Sandbox Code Playgroud)
结果是
dt1
# variable age player
#1: 1 10 a
#2: 1 20 b
#3: 1 20 b
#4: 1 30 c
#5: 2 20 b
#6: 2 10 a
#7: 2 30 c
#8: 2 11 a
Run Code Online (Sandbox Code Playgroud)
现在我们叫unique...
out <- unique(dt1[, .(age, player)])
out
# age player
#1: 10 a
#2: 20 b
#3: 30 c
#4: 11 a
Run Code Online (Sandbox Code Playgroud)
...并过滤"player"长度等于 1 的组
out <- out[, if(.N == 1) .SD, by=player]
out
# player age
#1: b 20
#2: c 30
Run Code Online (Sandbox Code Playgroud)
鉴于 OP 的输入数据,不需要最后一步。
数据
library(data.table)
dt <- data.table(
player_1 = c("a", "b", "b", "c"),
player_1_age = c(10, 20, 20, 30),
player_2 = c("b", "a", "c", "a"),
player_2_age = c(20, 10, 30, 11)
)
Run Code Online (Sandbox Code Playgroud)
参考:https : //cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html
| 归档时间: |
|
| 查看次数: |
202 次 |
| 最近记录: |