我有一个包含"name"美国总统的数据框,它们开始和结束的年份("from"和"to"列).这是一个示例:
name           from  to
Bill Clinton   1993 2001
George W. Bush 2001 2009
Barack Obama   2009 2012
......以及来自的输出dput:
dput(tail(presidents, 3))
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")
我想创建具有两列("name"和"year")的数据框,每年都有一行总统在职.因此,我需要创建一个常规序列,每年从" from"到"to".这是我的预期:
name           year
Bill Clinton   1993
Bill Clinton   1994
...
Bill Clinton   2000
Bill Clinton   2001
George W. Bush 2001
George W. Bush 2002
... 
George W. Bush 2008
George W. Bush 2009
Barack Obama   2009
Barack Obama   2010
Barack Obama   2011
Barack Obama   2012
我知道我可以data.frame(name = "Bill Clinton", year = seq(1993, 2001))用来扩展一位总统的事情,但我无法弄清楚如何为每位总统进行迭代.
我该怎么做呢?我觉得我应该知道这一点,但我画的是空白.
好的,我已经尝试了两种解决方案,而且我收到了一个错误:
foo<-structure(list(name = c("Grover Cleveland", "Benjamin Harrison", "Grover Cleveland"), from = c(1885, 1889, 1893), to = c(1889, 1893, 1897)), .Names = c("name", "from", "to"), row.names = 22:24, class = "data.frame")
ddply(foo, "name", summarise, year = seq(from, to))
Error in seq.default(from, to) : 'from' must be of length 1
Jos*_*ien 14
这是一个data.table解决方案.它有一个很好的(如果是次要的)将总统留在他们提供的订单中的功能:
library(data.table)
dt <- data.table(presidents)
dt[, list(year = seq(from, to)), by = name]
#               name year
#  1:   Bill Clinton 1993
#  2:   Bill Clinton 1994
#  ...
#  ...
# 21:   Barack Obama 2011
# 22:   Barack Obama 2012
编辑:要处理非连续条款的总统,请改用:
dt[, list(year = seq(from, to)), by = c("name", "from")]
flo*_*del 13
你可以使用这个plyr包:
library(plyr)
ddply(presidents, "name", summarise, year = seq(from, to))
#              name year
# 1    Barack Obama 2009
# 2    Barack Obama 2010
# 3    Barack Obama 2011
# 4    Barack Obama 2012
# 5    Bill Clinton 1993
# 6    Bill Clinton 1994
# [...]
如果数据按年份排序很重要,您可以使用以下arrange功能:
df <- ddply(presidents, "name", summarise, year = seq(from, to))
arrange(df, df$year)
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# 3    Bill Clinton 1995
# [...]
# 21   Barack Obama 2011
# 22   Barack Obama 2012
编辑1:遵循@ edgester的"更新1",更合适的方法是adply用来计算具有非连续术语的总统:
adply(foo, 1, summarise, year = seq(from, to))[c("name", "year")]
tidyverse使用unnest和的另一种方法map2。  
library(tidyverse)
presidents %>%
  unnest(year = map2(from, to, seq)) %>%
  select(-from, -to)
#              name  year
# 1    Bill Clinton  1993
# 2    Bill Clinton  1994
...
# 21   Barack Obama  2011
# 22   Barack Obama  2012
编辑:tidyr v1.0.0不能再将新变量创建为unnest().
presidents %>%
  mutate(year = map2(from, to, seq)) %>%
  unnest(year) %>%
  select(-from, -to)
两种base解决方案。
使用sequence:
len = d$to - d$from + 1
data.frame(name = d$name[rep(1:nrow(d), len)], year = sequence(len, d$from))
使用mapply:
l <- mapply(`:`, d$from, d$to) 
data.frame(name = d$name[rep(1:nrow(d), lengths(l))], year = unlist(l))
#              name year
# 1    Bill Clinton 1993
# 2    Bill Clinton 1994
# ...snip
# 8    Bill Clinton 2000
# 9    Bill Clinton 2001
# 10 George W. Bush 2001
# 11 George W. Bush 2002
# ...snip
# 17 George W. Bush 2008
# 18 George W. Bush 2009
# 19   Barack Obama 2009
# 20   Barack Obama 2010
# 21   Barack Obama 2011
# 22   Barack Obama 2012
正如@Esteis 在 comment 中指出的那样,在范围扩展之后,很可能有几列需要重复(不仅仅是“名称”,就像在 OP 中一样)。在这种情况下,只需重复整个数据帧的行(“from”和“to”列除外),而不是重复单个列的值。一个简单的例子:
d = data.frame(x = 1:2, y = 3:4, names = c("a", "b"),
               from = c(2001, 2011), to = c(2003, 2012))
#   x y names from   to
# 1 1 3     a 2001 2003
# 2 2 4     b 2011 2012
len = d$to - d$from + 1
cbind(d[rep(1:nrow(d), len), setdiff(names(d), c("from", "to"))],
      year = sequence(len, d$from))
    x y names year
1   1 3     a 2001
1.1 1 3     a 2002
1.2 1 3     a 2003
2   2 4     b 2011
2.1 2 4     b 2012
这是一个dplyr解决方案:
library(dplyr)
# the data
presidents <- 
structure(list(name = c("Bill Clinton", "George W. Bush", "Barack Obama"
), from = c(1993, 2001, 2009), to = c(2001, 2009, 2012)), .Names = c("name", 
"from", "to"), row.names = 42:44, class = "data.frame")
# the expansion of the table
presidents %>%
    rowwise() %>%
    do(data.frame(name = .$name, year = seq(.$from, .$to, by = 1)))
# the output
Source: local data frame [22 x 2]
Groups: <by row>
             name  year
            (chr) (dbl)
1    Bill Clinton  1993
2    Bill Clinton  1994
3    Bill Clinton  1995
4    Bill Clinton  1996
5    Bill Clinton  1997
6    Bill Clinton  1998
7    Bill Clinton  1999
8    Bill Clinton  2000
9    Bill Clinton  2001
10 George W. Bush  2001
..            ...   ...
h/t:https://stackoverflow.com/a/24804470/1036500
| 归档时间: | 
 | 
| 查看次数: | 7151 次 | 
| 最近记录: |