我有这个原始数据集,下面是示例数据集:
X1 X2
1 Born 1946-05-27
2 bioguide A000370
3 Born 1979-06-19
4 bioguide A000371
5 Born 1980-04-18
6 bioguide A000367
7 Born 1958-06-12
8 bioguide A000369
9 Born 1948-03-23
10 bioguide B001291
Run Code Online (Sandbox Code Playgroud)
使用这个,我想要的输出如下:
Born biouguide
1 1946-05-27 A000370
2 1979-06-19 A000371
3 1980-04-18 A000367
4 1958-06-12 A000369
5 1980-04-18 A000367
Run Code Online (Sandbox Code Playgroud)
此外,以下是原始数据集的 dput:
structure(list(X1 = c("Born", "bioguide", "Born", "bioguide",
"Born", "bioguide", "Born", "bioguide", "Born", "bioguide"),
X2 = c("1946-05-27", "A000370", "1979-06-19", "A000371",
"1980-04-18", "A000367", "1958-06-12", "A000369", "1948-03-23",
"B001291")), row.names = c(NA, 10L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
你能帮我做出想要的输出吗?
我们可以用 pivot_wider
library(dplyr)
library(tidyr)
df1 %>%
group_by(X1) %>%
mutate(rn = row_number()) %>%
pivot_wider(names_from = X1, values_from = X2) %>%
select(-rn)
# A tibble: 5 x 2
# Born bioguide
# <chr> <chr>
#1 1946-05-27 A000370
#2 1979-06-19 A000371
#3 1980-04-18 A000367
#4 1958-06-12 A000369
#5 1948-03-23 B001291
Run Code Online (Sandbox Code Playgroud)
或者在 base R
unstack(df1, X2 ~ X1)
Run Code Online (Sandbox Code Playgroud)
一种base R
选择可能是:
data.frame(Born = df[c(TRUE, FALSE), 2],
biouguide = df[c(FALSE, TRUE), 2])
Born biouguide
1 1946-05-27 A000370
2 1979-06-19 A000371
3 1980-04-18 A000367
4 1958-06-12 A000369
5 1948-03-23 B001291
Run Code Online (Sandbox Code Playgroud)