来自这样的数据框架
test <- data.frame('id'= rep(1:5,2), 'string'= LETTERS[1:10])
test <- test[order(test$id), ]
rownames(test) <- 1:10
> test
id string
1 1 A
2 1 F
3 2 B
4 2 G
5 3 C
6 3 H
7 4 D
8 4 I
9 5 E
10 5 J
Run Code Online (Sandbox Code Playgroud)
我想用每个id/string对的第一行创建一个新的.如果sqldf在其中接受R代码,则查询可能如下所示:
res <- sqldf("select id, min(rownames(test)), string
from test
group by id, string")
> res
id string
1 1 A
3 2 B
5 3 C
7 4 D
9 5 E …Run Code Online (Sandbox Code Playgroud) 我试图根据值的出现来获取数据帧的子集.这在下面给出的一个例子中得到了最好的解释.这个问题与以下内容有很大关系:为R中数据名称中列的每个唯一值选择最有限行数 但是,我想改变head()命令选择的项目数.
#Sample data
input <- matrix( c(1000001,1000001,1000001,1000001,1000001,1000001,1000002,1000002,1000002,1000003,1000003,1000003,100001,100002,100003,100004,100005,100006,100002,100003,100007,100002,100003,100008,"2011-01-01","2011-01-02","2011-01-01","2011-01-04","2011-01-01","2011-01-02","2011-01-01","2011-01-04","2011-01-01","2011-01-02","2011-01-01","2011-01-04"), ncol=3)
colnames(input) <- c( "Product" , "Something" ,"Date")
input <- as.data.frame(input)
input$Date <- as.Date(input[,"Date"], "%Y-%m-%d")
#Sort based on date, I want to leave out the entries with the oldest dates.
input <- input[ with( input, order(Date)), ]
#Create number of items I want to select
table_input <- as.data.frame(table(input$Product))
table_input$twentyfive <- ceiling( table_input$Freq*0.25 )
#This next part is a very time consuming method (Have 2 mln rows, 90k different products)
first <- …Run Code Online (Sandbox Code Playgroud) 输入文件:
y <- read.table(textConnection('
c1 c2 c3
1 a b -1
2 a b -1
3 a c 1
4 a b 1
5 a b -1
'), header=TRUE)
Run Code Online (Sandbox Code Playgroud)
因此,y是
c1 c2 c3
1 a b -1
2 a b -1
3 a c 1
4 a b 1
5 a b -1
Run Code Online (Sandbox Code Playgroud)
输出文件将是:
c1 c2 c3
1 a b -1
3 a c 1
4 a b 1
Run Code Online (Sandbox Code Playgroud)
如何删除所有列中具有相同条目的多个或重复行?
可能重复:
R:在多个列中查找模式 - 可能重复()?
亲爱的大家,
这是我的数据集的一部分:
name chr start stop strand alias
60 uc003vqx.2 chr7 130835560 130891916 - PODXL
61 uc003xlp.1 chr8 38387812 38445509 - FLG
62 uc003xlu.1 chr8 38400008 38445509 - FLG
63 uc003xlv.1 chr8 38400008 38445509 - FLG
64 uc003xtz.1 chr8 61263976 61356508 - CA8
65 uc003xua.1 chr8 61283183 61356508 - CA8
66 uc010lwg.1 chr8 38387812 38445509 - FLG
67 uc010lwh.1 chr8 38387812 38445509 - FLG
68 uc010lwj.1 chr8 38387812 38445509 - FLG
Run Code Online (Sandbox Code Playgroud)
我想基于唯一的start,stop和alias列过滤数据集.最终结果必须是这样的:
name chr start stop …Run Code Online (Sandbox Code Playgroud)