我在数据表中有一组五列.
dt <- data.table(
city = c(rep(1,2), rep(2,2), rep(3,2), rep(4,2)),
neighborhoods.1 = c(NA, "a", "b", "c", NA, NA, "d", "e"),
neighborhoods.2 = c(NA, "f", "g", rep(NA,5)),
neighborhoods.3 = c(NA, "h", rep(NA, 6)),
irrelevantdata = c(1:8)
)
city neighborhoods.1 neighborhoods.2 neighborhoods.3 irrelevantdata
1: 1 NA NA NA 1
2: 1 a f h 2
3: 2 b g NA 3
4: 2 c NA NA 4
5: 3 NA NA NA 5
6: 3 NA NA NA 6
7: 4 d NA NA 7
8: 4 e NA NA 8
Run Code Online (Sandbox Code Playgroud)
我想将前四列合并为一列.
neighborhood
1: 1
2: 1-a-f-h
3: 2-b-g
4: 2-c
5: 3
6: 3
7: 4-d
8: 4-e
Run Code Online (Sandbox Code Playgroud)
正如你所看到的,我正在删除NA记录并用a分隔-.
我试过这个,这在处理中有明显的问题j:
business[
,
neighborhood = paste0(
city,
if(!is.na(neighborhoods.1)) paste0("-", neighborhoods.1),
if(!is.na(neighborhoods.2)) paste0("-", neighborhoods.2),
if(!is.na(neighborhoods.3)) paste0("-", neighborhoods.3),
""
)
]
Run Code Online (Sandbox Code Playgroud)
我怎么能这样做?
更新以反映我不想要合并的其他列.
一种选择是将paste行中的元素一起使用do.call,然后在输出向量中删除NA元素和extra -.
dt[,.(neighborhood = gsub('-NA|NA-', '',
do.call(paste, c(.SD, sep='-')))), .SDcols= city:neighborhoods.3]
Run Code Online (Sandbox Code Playgroud)
或者另一个选项是按行序列分组unlist,Data.table(.SD)的子集,删除NA元素(na.omit),paste元素在一起.我们可以指定要用于此操作的列.SDcols.
dt[, .(neighbourhood = paste(na.omit(unlist(.SD)),collapse='-')) ,
by=1:nrow(dt), .SDcols= city:neighborhoods.3]
Run Code Online (Sandbox Code Playgroud)
或者@Frank建议的另一个选项是melt数据集的子集(由所需的列指定)到长格式然后paste
mycols <- setdiff(names(dt), 'irrelevantdata')
na.omit(melt(dt[,mycols,with=FALSE][, r := .I],
id.var="r"))[, paste(value, collapse="-"), by=r]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
92 次 |
| 最近记录: |