利用dplyr进行资本化

Phi*_*rez 2 r capitalize dplyr

我正在使用dplyr进行数据清理.我想要做的一件事是在某些列中大写值.

    data$surname
    john
    Mary
    John
    mary
    ...
Run Code Online (Sandbox Code Playgroud)

我想我必须使用dplyr 的mutate函数

    titleCase <- function(x) {
    + s <- strsplit(as.character(x), " ")[[1]]
    + paste(toupper(substring(s, 1, 1)), substring(s, 2),
    + sep = "", collapse = " ")
    + }
Run Code Online (Sandbox Code Playgroud)

但如何将两者结合起来?我得到各种错误或截断的数据帧

谢谢

Obe*_*Obe 9

聚会有点晚了,但你可以使用stringr套餐

library(stringr)
library(dplyr)

example1 <- tibble(names = c("john" ,"Mary", "John", "mary"))

example1 %>%
mutate(names = str_to_title(names))

##  names
##  <chr>
## 1 John 
## 2 Mary 
## 3 John 
## 4 Mary    
Run Code Online (Sandbox Code Playgroud)

如果您希望所有术语都大写,这仍然有效

example2 <- tibble(names = c("john james" ,"Mary carey", "John Jack", "mary Harry"))

example2 %>%
mutate(names = str_to_title(names))

##  names
##  <chr>
## 1 John James 
## 2 Mary Carey 
## 3 John Jack
## 4 Mary Harry    
Run Code Online (Sandbox Code Playgroud)

如果您只想将第一个术语大写,str_to_sentence()则可以

example2 %>%
mutate(names = str_to_sentence(names))

##  names
##  <chr>
## 1 John james 
## 2 Mary carey 
## 3 John jack
## 4 Mary harry    
Run Code Online (Sandbox Code Playgroud)


akr*_*run 7

我们可以用 sub

sub("(.)", "\\U\\1", data$surname, perl=TRUE)
#[1] "John" "Mary" "John" "Mary"
Run Code Online (Sandbox Code Playgroud)

dplyr工作流程中实施

library(dplyr)
data %>%
     mutate(surname = sub("(.)", "\\U\\1", surname, perl=TRUE))
Run Code Online (Sandbox Code Playgroud)

如果我们需要在多个列上执行此操作

data %>%
     mutate_each(funs(sub("(.)", "\\U\\1", ., perl=TRUE)))
Run Code Online (Sandbox Code Playgroud)

只是为了检查

res <- data1 %>%  
          mutate(surname = sub("(.)", "\\U\\1", surname, perl=TRUE))
sum(grepl("[A-Z]", substr(res$surname, 1,1)))
#[1] 500000
Run Code Online (Sandbox Code Playgroud)

数据

data <- data.frame(surname=c("john", "Mary", "John", "mary"), 
firstname = c("abe", "Jacob", "george", "jen"), stringsAsFactors=FALSE)

data1 <-  data.frame(surname = sample(c("john", "Mary", "John", "mary"), 
    500000, replace=TRUE), stringsAsFactors=FALSE)
Run Code Online (Sandbox Code Playgroud)


RHe*_*tel 5

有一个专门的功能,您可以尝试:

R.utils::capitalize(data$surname)
Run Code Online (Sandbox Code Playgroud)

如果需要在dplyr程序中实现,可以尝试以下方法:

library(dplyr)
library(R.utils)
data %>% mutate(surname = capitalize(surname))
Run Code Online (Sandbox Code Playgroud)