Sorting one variable in a data frame by id

Question

Sorting one variable in a data frame by id

I have a data frame with lot of company information separated by an id variable. I want to sort one of the variables and repeat it for every id. Let's take this example,

df <- structure(list(id = c(110, 110, 110, 90, 90, 90, 90, 252, 252
), var1 = c(26, 21, 54, 10, 18, 9, 16, 54, 39), var2 = c(234, 
12, 43, 32, 21, 19, 16, 34, 44)), .Names = c("id", "var1", "var2"
), row.names = c(NA, -9L), class = "data.frame")

Run Code Online (Sandbox Code Playgroud)

Which looks like this

df
   id var1 var2
1 110   26  234
2 110   21   12
3 110   54   43
4  90   10   32
5  90   18   21
6  90    9   19
7  90   16   16
8 252   54   34
9 252   39   44

Run Code Online (Sandbox Code Playgroud)

Now, I want to sort the data frame according to var1 by the vector id. Easiest solution I can think of is using apply function like this,

> apply(df, 2, sort)
       id var1 var2
 [1,]  90    9   12
 [2,]  90   10   16
 [3,]  90   16   19
 [4,]  90   18   21
 [5,] 110   21   32
 [6,] 110   26   34
 [7,] 110   39   43
 [8,] 252   54   44
 [9,] 252   54  234

Run Code Online (Sandbox Code Playgroud)

However, this is not the output I am seeking. The correct output should be,

   id var1 var2
1 110   21   12
2 110   26  234
3 110   54   43
4  90    9   19
5  90   10   32
6  90   16   16
7  90   18   21
8 252   39   44
9 252   54   34

Run Code Online (Sandbox Code Playgroud)

Group by id and sort by var1 column and keep original id column order.

Any idea how to sort like this?

Answer 1

mar*_*kus 6

使用order和的另一个基本R选项match

df[with(df, order(match(id, unique(id)), var1, var2)), ]
#   id var1 var2
#2 110   21   12
#1 110   26  234
#3 110   54   43
#6  90    9   19
#4  90   10   32
#7  90   16   16
#5  90   18   21
#9 252   39   44
#8 252   54   34

Run Code Online (Sandbox Code Playgroud)

Answer 2

tho*_*hal 6

注意。如Moody_Mudskipper所述，无需使用tidyverse，也可以使用base轻松完成R：

df[order(ordered(df$id, unique(df$id)), df$var1), ]

Run Code Online (Sandbox Code Playgroud)

没有tidyverse任何temp变量的单线解决方案：

library(tidyverse)
df %>% arrange(ordered(id, unique(id)), var1)
#    id var1 var2
# 1 110   26  234
# 2 110   21   12
# 3 110   54   43
# 4  90   10   32
# 5  90   18   21
# 6  90    9   19
# 7  90   16   16
# 8 252   54   34
# 9 252   39   44

Run Code Online (Sandbox Code Playgroud)

为什么apply(df, 2, sort)不起作用的解释

您试图做的是对每个列进行独立排序。apply在指定的维度上运行（2在这种情况下，它对应于列）并应用函数（sort在这种情况下）。

apply尝试进一步简化结果，在这种情况下简化为矩阵。因此，您将获得一个矩阵（不是 a data.frame），其中每一列都是独立排序的。例如，apply呼叫中的这一行：

# [1,]  90    9   12

Run Code Online (Sandbox Code Playgroud)

甚至根本不存在data.frame。

在基本的`R`中，您的解决方案将变为`df [order（factor（df $ id，unique（df $ id）），df $ var1），]）（我使用factor代替了ordered的ordered。不必要，输入时间更长，且鲜为人知）。 (2认同)

归档时间：	6 年，5 月前
查看次数：	111 次
最近记录：	6 年，5 月前