如何在R中合并两个data.frames,引用查找表

Joh*_*n 5 merge r dataframe

我试图将两个合并data.frames在一起,基于每个被调用的公共列名称series_id.这是我的合并声明:

merge(test_growth_series_LUT,  test_growth_series, by = intersect(series_id, series_id))
Run Code Online (Sandbox Code Playgroud)

我得到的错误是

as.vector(y)出错:找不到对象'series_id'

帮助提供了这个描述,但我不明白为什么它找不到series_id.示例数据如下.

### S3 method for class 'data.frame':
   #merge(x, y, by = intersect(names(x), names(y)),
   #      by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
   #      sort = TRUE, suffixes = c(".x",".y"), ...)



# Create a long data.frame to store data...
test_growth_series = data.frame ("read_day" = c(0, 3, 9, 0, 3, 9, 0, 2, 8), 
"series_id" = c("p1s1", "p1s1", "p1s1", "p1s2", "p1s2", "p1s2", "p3s4", "p3s4", "p3s4"),
"mean_od" = c(0.6, 0.9, 1.3, 0.3, 0.6, 1.0, 0.2, 0.5, 1.2),
"sd_od" = c(0.1, 0.2, 0.2, 0.1, 0.1, 0.3, 0.04, 0.1, 0.3),
"n_in_stat" = c(8, 8, 8, 8, 7, 5, 8, 7, 2)
)

# Create a name LUT
test_growth_series_LUT = data.frame ("series_id" = c("p1s1", "p1s2", "p3s4", "p4s2", "p5s2", "p6s2", "p7s4", "p8s4", "p9s4"),"description" = c("blah1", "blah2", "blah3", "blah4", "blah5", "blah6", "blah7", "blah8", "blah9")
)

> test_growth_series
  read_day series_id mean_od sd_od n_in_stat
1        0      p1s1     0.6  0.10         8
2        3      p1s1     0.9  0.20         8
3        9      p1s1     1.3  0.20         8
4        0      p1s2     0.3  0.10         8
5        3      p1s2     0.6  0.10         7
6        9      p1s2     1.0  0.30         5
7        0      p3s4     0.2  0.04         8
8        2      p3s4     0.5  0.10         7
9        8      p3s4     1.2  0.30         2
> test_growth_series_LUT
  series_id description
1      p1s1       blah1
2      p1s2       blah2
3      p3s4       blah3
4      p4s2       blah4
5      p5s2       blah5
6      p6s2       blah6
7      p7s4       blah7
8      p8s4       blah8
9      p9s4       blah9
> 



this is what I'm trying to achieve:  
> new_test_growth_series
  read_day series_id mean_od sd_od n_in_stat        description
1        0      p1s1     0.6  0.10         8        blah1
2        3      p1s1     0.9  0.20         8        blah1
3        9      p1s1     1.3  0.20         8        blah1
4        0      p1s2     0.3  0.10         8        blah2
5        3      p1s2     0.6  0.10         7        blah2
6        9      p1s2     1.0  0.30         5        blah2
7        0      p3s4     0.2  0.04         8        blah3
8        2      p3s4     0.5  0.10         7        blah3
9        8      p3s4     1.2  0.30         2        blah3
Run Code Online (Sandbox Code Playgroud)

Sha*_*ane 9

你可以这样做:

merge(test_growth_series_LUT, test_growth_series)
Run Code Online (Sandbox Code Playgroud)

它会自动匹配名称.如果需要指定列,可以这样做:

merge(test_growth_series_LUT, test_growth_series, by = "series_id")
Run Code Online (Sandbox Code Playgroud)

或者这种方式,如果您需要在两侧指定(只有在它们具有您想要匹配的不同名称时才需要):

merge(test_growth_series_LUT, test_growth_series, by.x = "series_id", by.y = "series_id")
Run Code Online (Sandbox Code Playgroud)

我建议通过转到merge(?merge)的帮助或通过调用来查看示例(并逐步浏览它们)example("merge", "base")(实际上通过它自己进行操作不太有用).

两个笔记:

  1. 你永远不需要在这里使用交叉函数.用于c()显式指定多个列名.或者使用all,all.xall.y参数指定什么样的加入你想要的.
  2. 除非已附加数据,否则在大多数情况下,您将使用引号指定列名称.否则会抱怨无法找到该名称.特别是,当您不使用引号时,名称必须位于搜索路径中.