R:如何使用在单个列中连接的var-val对来整理数据

use*_*672 1 r concatenation data-structures tidyr

我已经尝试在这里这里解决这个问题 - 原因得到了很好的答案,但我意识到这只是我认为是一个普遍问题的部分解决方案:通常数据被组织为有变量(最有趣的是显然)每个变量一列,然后是最后一列,其中几个变量值对已放在一起.我一直在努力寻找将最后一列变量转换为单独列的一般方法,这应该整理数据不是一项工作tidyr吗?

require(dplyr)
require(stringr)

data <- 
      data.frame(
        shoptype=c("A","B","B"),
        city=c("bah", "bah", "slah"),
        sale=c("type cheese; price 200", "type ham; price 150","type cheese; price 100" )) %>%
      tbl_df()

> data
Source: local data frame [3 x 3]

  shoptype city                   sale
1        A  bah type cheese; price 200
2        B  bah    type ham; price 150
3        B slah type cheese; price 100
Run Code Online (Sandbox Code Playgroud)

在这里,我们获得了一些城市中一些商店的信息,这些商店有一个连接列,其中变量用";"分隔 和var-val与空间.人们希望输出如下:

    shoptype    city    type    price
1   A   bah cheese  200
2   B   bah ham 150
3   B   slah    cheese  100
Run Code Online (Sandbox Code Playgroud)

当所有行都是唯一的行时(参见链接的SO问题)

require(plyr)
require(dplyr)
require(stringr)
require(tidyr)  
data %>%
  mutate(sale = str_split(as.character(sale), "; ")) %>%
  unnest(sale) %>%
  mutate(sale = str_trim(sale)) %>%
  separate(sale, into = c("var", "val")) %>%
  spread(var, val)
Run Code Online (Sandbox Code Playgroud)

但是如果我们将第二行shoptype改为"A",我们就会因此而出错.喜欢:

data2 <- 
  data.frame(
    shoptype=c("A","A","B"),
    city=c("bah", "bah", "slah"),
    sale=c("type cheese; price 200", "type ham; price 150","type cheese; price 100" )) %>%
  tbl_df()
data2 %>%
  mutate(sale = str_split(as.character(sale), "; ")) %>%
  unnest(sale) %>%
  mutate(sale = str_trim(sale)) %>%
  separate(sale, into = c("var", "val")) %>%
  spread(var, val)
Error: Duplicate identifiers for rows (2, 4), (1, 3)
Run Code Online (Sandbox Code Playgroud)

我尝试用唯一的ID来解决这个问题(再次看到链接的SO答案):

data2 %>%
  mutate(sale = str_split(as.character(sale), "; ")) %>%
  unnest(sale) %>%
  mutate(sale = str_trim(sale),
         v0=rownames(.)) %>%
  separate(sale, into = c("var", "val")) %>%
  spread(var, val)
Source: local data frame [6 x 5]

  shoptype city v0 price   type
1        A  bah  1    NA cheese
2        A  bah  2   200     NA
3        A  bah  3    NA    ham
4        A  bah  4   150     NA
5        B slah  5    NA cheese
6        B slah  6   100     NA
Run Code Online (Sandbox Code Playgroud)

这给出了结构缺失的数据,我无法弄清楚如上面所需的输出所描述的如何收集.

我想我真的错过了一些属于tidyr范围的东西(我希望!).

dav*_*ers 6

我认为不需要使用tidyr::unnesttidyr::gather.这是一个专注于stringr::str_replace和的替代解决方案tidyr::separate:

library(dplyr)
library(stringr)
library(tidyr)

data2 %>%
  mutate(
    sale = str_replace(sale, "type ", ""),
    sale = str_replace(sale, " price ", "")
    ) %>%
  separate(sale, into = c("type", "price"), sep = ";") 

# Source: local data frame [3 x 4]

#   shoptype city   type price
# 1        A  bah cheese   200
# 2        A  bah    ham   150
# 3        B slah cheese   100
Run Code Online (Sandbox Code Playgroud)