我有一个包含"name"
美国总统的数据框,它们开始和结束的年份("from"
和"to"
列).这是一个示例:
name from to
Bill Clinton 1993 2001
George W. Bush 2001 2009
Barack Obama 2009 2012
Run Code Online (Sandbox Code Playgroud)
......以及来自的输出dput
:
dput(tail(presidents, 3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
我想创建具有两列("name"
和"year"
)的数据框,每年都有一行总统在职.因此,我需要创建一个常规序列,每年从" from
"到"to"
.这是我的预期:
name year
Bill Clinton 1993
Bill Clinton 1994
...
Bill Clinton 2000
Bill Clinton 2001
George W. Bush 2001
George W. Bush 2002
...
George W. Bush 2008
George W. Bush 2009
Barack Obama 2009
Barack Obama 2010
Barack Obama 2011
Barack Obama 2012
Run Code Online (Sandbox Code Playgroud)
我知道我可以data.frame(name = "Bill Clinton", year = seq(1993, 2001))
用来扩展一位总统的事情,但我无法弄清楚如何为每位总统进行迭代.
我该怎么做呢?我觉得我应该知道这一点,但我画的是空白.
好的,我已经尝试了两种解决方案,而且我收到了一个错误:
foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1
Run Code Online (Sandbox Code Playgroud)
Jos*_*ien 14
这是一个data.table
解决方案.它有一个很好的(如果是次要的)将总统留在他们提供的订单中的功能:
library(data.table)
dt <- data.table(presidents)
dt[, list(year = seq(from, to)), by = name]
# name year
# 1: Bill Clinton 1993
# 2: Bill Clinton 1994
# ...
# ...
# 21: Barack Obama 2011
# 22: Barack Obama 2012
Run Code Online (Sandbox Code Playgroud)
编辑:要处理非连续条款的总统,请改用:
dt[, list(year = seq(from, to)), by = c("name", "from")]
Run Code Online (Sandbox Code Playgroud)
flo*_*del 13
你可以使用这个plyr
包:
library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
# name year
# 1 Barack Obama 2009
# 2 Barack Obama 2010
# 3 Barack Obama 2011
# 4 Barack Obama 2012
# 5 Bill Clinton 1993
# 6 Bill Clinton 1994
# [...]
Run Code Online (Sandbox Code Playgroud)
如果数据按年份排序很重要,您可以使用以下arrange
功能:
df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# 3 Bill Clinton 1995
# [...]
# 21 Barack Obama 2011
# 22 Barack Obama 2012
Run Code Online (Sandbox Code Playgroud)
编辑1:遵循@ edgester的"更新1",更合适的方法是adply
用来计算具有非连续术语的总统:
adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]
Run Code Online (Sandbox Code Playgroud)
tidyverse
使用unnest
和的另一种方法map2
。
library(tidyverse)
presidents %>%
unnest(year = map2(from, to, seq)) %>%
select(-from, -to)
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
...
# 21 Barack Obama 2011
# 22 Barack Obama 2012
Run Code Online (Sandbox Code Playgroud)
编辑:tidyr v1.0.0
不能再将新变量创建为unnest()
.
presidents %>%
mutate(year = map2(from, to, seq)) %>%
unnest(year) %>%
select(-from, -to)
Run Code Online (Sandbox Code Playgroud)
两种base
解决方案。
使用sequence
:
len = d$to - d$from + 1
data.frame(name = d$name[rep(1:nrow(d), len)], year = sequence(len, d$from))
Run Code Online (Sandbox Code Playgroud)
使用mapply
:
l <- mapply(`:`, d$from, d$to)
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
# name year
# 1 Bill Clinton 1993
# 2 Bill Clinton 1994
# ...snip
# 8 Bill Clinton 2000
# 9 Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19 Barack Obama 2009
# 20 Barack Obama 2010
# 21 Barack Obama 2011
# 22 Barack Obama 2012
Run Code Online (Sandbox Code Playgroud)
正如@Esteis 在 comment 中指出的那样,在范围扩展之后,很可能有几列需要重复(不仅仅是“名称”,就像在 OP 中一样)。在这种情况下,只需重复整个数据帧的行(“from”和“to”列除外),而不是重复单个列的值。一个简单的例子:
d = data.frame(x = 1:2, y = 3:4, names = c("a", "b"),
from = c(2001, 2011), to = c(2003, 2012))
# x y names from to
# 1 1 3 a 2001 2003
# 2 2 4 b 2011 2012
len = d$to - d$from + 1
cbind(d[rep(1:nrow(d), len), setdiff(names(d), c("from", "to"))],
year = sequence(len, d$from))
x y names year
1 1 3 a 2001
1.1 1 3 a 2002
1.2 1 3 a 2003
2 2 4 b 2011
2.1 2 4 b 2012
Run Code Online (Sandbox Code Playgroud)
这是一个dplyr
解决方案:
library(dplyr)
# the data
presidents <-
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name",
"from", "to"), row.names = 42:44, class = "data.frame")
# the expansion of the table
presidents %>%
rowwise() %>%
do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))
# the output
Source: local data frame [22 x 2]
Groups: <by row>
name year
(chr) (dbl)
1 Bill Clinton 1993
2 Bill Clinton 1994
3 Bill Clinton 1995
4 Bill Clinton 1996
5 Bill Clinton 1997
6 Bill Clinton 1998
7 Bill Clinton 1999
8 Bill Clinton 2000
9 Bill Clinton 2001
10 George W. Bush 2001
.. ... ...
Run Code Online (Sandbox Code Playgroud)
h/t:https://stackoverflow.com/a/24804470/1036500