按行指定特定值后转换为 NA

Sot*_*tos 24 r dataframe

想象一下以下数据框:

#  ID v1 v2 v3 v4
#1  H  0  0  d  0
#2  I  0  0  0  0
#3  J  d  0  0  0
#4  K  0  0  0  d
#5  L  0  d  0  0
Run Code Online (Sandbox Code Playgroud)

d每行要么有一个,要么没有。

对于每一行,我想d将之后的所有内容转换为NA。期望的结果:

#  ID v1  v2  v3  v4
#1  H  0   0   d  NA
#2  I  0   0   0   0
#3  J  d  NA  NA  NA
#4  K  0   0   0   d
#5  L  0   d  NA  NA
Run Code Online (Sandbox Code Playgroud)

数据

df <- data.frame(ID = LETTERS[8:12], 
                 v1 = c(0, 0, 'd', 0, 0), 
                 v2 = c(0, 0, 0, 0, 'd'), 
                 v3 = c('d', 0, 0, 0, 0), 
                 v4 = c(0, 0, 0, 'd', 0), 
      stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)

Hen*_*rik 15

使用cummax

ix = df == "d"
df[t(apply(ix, 1, cummax)) & !ix] = NA
#   ID v1   v2   v3   v4
# 1  H  0    0    d <NA>
# 2  I  0    0    0    0
# 3  J  d <NA> <NA> <NA>
# 4  K  0    0    0    d
# 5  L  0    d <NA> <NA>
Run Code Online (Sandbox Code Playgroud)

要提高速度,请替换applycollapse::dapply

ix = df == "d"
df[collapse::dapply(ix, cummax, MARGIN = 1) & !ix] = NA
Run Code Online (Sandbox Code Playgroud)

或者使用matrixStats::rowCummaxs

ix = df == "d"
df[rowCummaxs(ix) & !ix] = NA
Run Code Online (Sandbox Code Playgroud)

对于预0.62.0 matrixStats,请参阅先前的修订版


Jaa*_*aap 9

两种替代解决方案:

# option 1
w <- which(df == "d", arr.ind = TRUE)
w <- w[w[,2] < ncol(df),]
reps <- ncol(df) - w[,2]
w <- w[rep(1:nrow(w), reps),]
w[,2] <- w[,2] + unlist(sapply(reps, seq))

df[w] <- NA

# option 2
mc <- ncol(df) - max.col(df == "d", ties.method = "first")
mc[mc >= (ncol(df) - 1)] <- 0
rr <- rep(seq_along(mc), mc)
cc <- rep(ncol(df) - mc, mc) + unlist(sapply(mc, seq)[mc > 0])

df[cbind(rr, cc)] <- NA
Run Code Online (Sandbox Code Playgroud)

这两者也给出了期望的结果。


Sot*_*tos 7

我解决这个问题的版本是:

f1 <- function(x){
  i1 <- which(x == 'd') + 1
  cond <- length(i1) > 0 && i1 <= length(x)
  if (cond){x[i1:(length(x))] <- NA;x}else{x}
}
df[-1] <- t(apply(df[-1], 1, f1))
Run Code Online (Sandbox Code Playgroud)

这使,

#  ID v1   v2   v3   v4
#1  H  0    0    d <NA>
#2  I  0    0    0    0
#3  J  d <NA> <NA> <NA>
#4  K  0    0    0    d
#5  L  0    d <NA> <NA>
Run Code Online (Sandbox Code Playgroud)


G. *_*eck 7

这里有两个基本的 R 衬里。

1)Reduce因为它一次对整个列进行操作,而不是逐行操作,所以如果行很多但列不多,它应该特别快。

replace(df, TRUE, Reduce(function(x, y) ifelse(x == "d", NA, y), df, acc = TRUE))
Run Code Online (Sandbox Code Playgroud)

给予:

  ID v1   v2   v3   v4
1  H  0    0    d <NA>
2  I  0    0    0    0
3  J  d <NA> <NA> <NA>
4  K  0    0    0    d
5  L  0    d <NA> <NA>
Run Code Online (Sandbox Code Playgroud)

2) read.table 这假设 唯一出现d在由单个组成的单元格中d(问题中的示例就是这种情况)。

replace(df, df!="d"&is.na(read.table(text=do.call(paste,df), comment="d", fill=NA)), NA)
Run Code Online (Sandbox Code Playgroud)

给予:

  ID v1   v2   v3   v4
1  H  0    0    d <NA>
2  I  0    0    0    0
3  J  d <NA> <NA> <NA>
4  K  0    0    0    d
5  L  0    d <NA> <NA>
Run Code Online (Sandbox Code Playgroud)


the*_*ail 6

另一个版本使用coland max.col

df[-1][col(df[-1]) > max.col(df[-1] == "d", "last")] <- NA
df

#  ID v1   v2   v3   v4
#1  H  0    0    d <NA>
#2  I  0    0    0    0
#3  J  d <NA> <NA> <NA>
#4  K  0    0    0    d
#5  L  0    d <NA> <NA>
Run Code Online (Sandbox Code Playgroud)