删除仅基于上一行的重复行

Question

删除仅基于上一行的重复行

我正在尝试从数据框中删除重复的行,仅基于前一行.该duplicate和unique功能将删除所有重复,让你只用唯一行,这不是我想要的.

我用循环说明了这里的问题.我需要对此进行矢量化,因为我的实际数据集要大得多以使用循环.

x <- c(1,1,1,1,3,3,3,4)
y <- c(1,1,1,1,3,3,3,4)
z <- c(1,2,1,1,3,2,2,4)
xy <- data.frame(x,y,z)

xy
  x y z
1 1 1 1
2 1 1 2
3 1 1 1
4 1 1 1 #this should be removed
5 3 3 3
6 3 3 2
7 3 3 2 #this should be removed
8 4 4 4

# loop that produces desired output
toRemove <- NULL
for (i in 2:nrow(xy)){
   test <- as.vector(xy[i,] == xy[i-1,])
   if (!(FALSE %in% test)){ 
      toRemove <- c(toRemove, i) #build a vector of rows to remove
   }
}
xy[-toRemove,] #exclude rows
  x y z
1 1 1 1
2 1 1 2
3 1 1 1
5 3 3 3
6 3 3 2
8 4 4 4

Run Code Online (Sandbox Code Playgroud)

我已经尝试过使用dplyr的lag函数,但它只适用于单列,当我尝试在所有3列上运行它时它不起作用.

ifelse(xy[,1:3] == lag(xy[,1:3],1), NA, xy[,1:3])

关于如何实现这一目标的任何建议？

Answer 1

zx8*_*754 5

看起来我们要删除行如上所述:

# make an index, if cols not same as above
ix <- c(TRUE, rowSums(tail(xy, -1) == head(xy, -1)) != ncol(xy))

# filter
xy[ix, ]

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，5 月前
查看次数：	574 次
最近记录：	7 年，8 月前