如何使用dplyr替换数据框中的字符?

max*_*oku 2 r dplyr

我有一个数据框,其中一列有"MISSING"值和数值,我想用NA替换.我知道我可以在dplyr之外做,但我想把它保存在dplyr工具链中.

read.csv('data.csv', header=F) %>% 
  select(V1,V4) %>% 
  mutate(V4=replace(V4, "MISSING", "NA")) 
Run Code Online (Sandbox Code Playgroud)

但这是一个错误:

Error in mutate_impl(.data, dots) : 
  Column `V4` must be length 30681 (the number of rows) or one, not 30682
Run Code Online (Sandbox Code Playgroud)

数据

structure(list(V1 = c("01/01/1933", "01/02/1933", "01/03/1933", 
"01/04/1933", "01/05/1933"), V4 = c("MISSING", "MISSING", "MISSING", 
"MISSING", "MISSING")), .Names = c("V1", "V4"), class = c("data.table", 
"data.frame"), row.names = c(NA, -5L), .internal.selfref = <pointer: 0x10280cf78>)
Run Code Online (Sandbox Code Playgroud)

CPa*_*Pak 6

你可以不指定列

library(dplyr)
df <- df %>% replace(.=="MISSING", NA)
Run Code Online (Sandbox Code Playgroud)


ali*_*ire 5

dplyr::na_if 是为此目的而设计的:

library(dplyr)

df <- structure(list(V1 = c("01/01/1933", "01/02/1933", "01/03/1933", "01/04/1933", "01/05/1933"), 
                     V4 = c("MISSING", "MISSING", "MISSING", "MISSING", "MISSING")), 
                .Names = c("V1", "V4"), class = "data.frame", row.names = c(NA, -5L))

df %>% mutate(V4 = na_if(V4, 'MISSING'))
#>           V1   V4
#> 1 01/01/1933 <NA>
#> 2 01/02/1933 <NA>
#> 3 01/03/1933 <NA>
#> 4 01/04/1933 <NA>
#> 5 01/05/1933 <NA>
Run Code Online (Sandbox Code Playgroud)

真的,这是更好地照顾这个任务的进口,但是,如用na.strings的参数read.csvdata.table::freadna参数readr::read_csv.

此外,您的数据目前是data.table(可能是因为您使用过fread),它有自己的语法[.如果你想使用fread,但保持结果的标准data.frame,设置data.table = FALSEfread.