使用R中的匹配函数,具有无匹配的返回值

pan*_*tts 8 merge r match no-match dataframe

我有一个更大的现有数据帧.对于这个较小的例子,我想根据"first"列替换一些变量(将state(df1)替换为newstate(df2)).我的问题是值返回为NA,因为只有一些名称在新数据帧(df2)中匹配.

现有数据框:

state = c("CA","WA","OR","AZ")
first = c("Jim","Mick","Paul","Ron")
df1 <- data.frame(first, state)

      first state
    1   Jim    CA
    2  Mick    WA
    3  Paul    OR
    4   Ron    AZ
Run Code Online (Sandbox Code Playgroud)

与现有数据帧匹配的新数据帧

state = c("CA","WA")
newstate = c("TX", "LA")
first =c("Jim","Mick")
df2 <- data.frame(first, state, newstate)

  first state newstate
1   Jim    CA       TX
2  Mick    WA       LA
Run Code Online (Sandbox Code Playgroud)

试图使用匹配但返回NA为"状态",其中在原始数据帧中找不到与df2匹配的"第一"变量.

df1$state <- df2$newstate[match(df1$first, df2$first)]

  first state
1   Jim    TX
2  Mick    LA
3  Paul  <NA>
4   Ron  <NA>
Run Code Online (Sandbox Code Playgroud)

有没有办法忽略nomatch或nomatch按原样返回现有变量?这将是期望结果的例子:吉姆/米克的状态得到更新,而保罗和罗恩的状态不会改变.

      first state
    1   Jim    TX
    2  Mick    LA
    3  Paul    OR
    4   Ron    AZ
Run Code Online (Sandbox Code Playgroud)

小智 9

这是你想要的吗; 除非你真的想要使用因子,否则在你的data.frame调用中使用stringsAsFactors = FALSE.注意在匹配调用中使用nomatch = 0.

> state = c("CA","WA","OR","AZ")
> first = c("Jim","Mick","Paul","Ron")
> df1 <- data.frame(first, state, stringsAsFactors = FALSE)
> state = c("CA","WA")
> newstate = c("TX", "LA")
> first =c("Jim","Mick")
> df2 <- data.frame(first, state, newstate, stringsAsFactors = FALSE)
> df1
  first state
1   Jim    CA
2  Mick    WA
3  Paul    OR
4   Ron    AZ
> df2
  first state newstate
1   Jim    CA       TX
2  Mick    WA       LA
> 
> # create an index for the matches
> indx <- match(df1$first, df2$first, nomatch = 0)
> df1$state[indx != 0] <- df2$newstate[indx]
> df1
  first state
1   Jim    TX
2  Mick    LA
3  Paul    OR
4   Ron    AZ
Run Code Online (Sandbox Code Playgroud)