小写某些词R

tso*_*kis 4 regex r

我需要将某些单词转换为小写.我正在处理电影片名列表,如果它们不是标题中的第一个单词,那么介词和文章通常是小写的.如果我有矢量:

movies = c('The Kings Of Summer', 'The Words', 'Out Of The Furnace', 'Me And Earl And The Dying Girl')

我需要的是这个:

movies_updated = c('The Kings of Summer', 'The Words', 'Out of the Furnace', 'Me and Earl and the Dying Girl')

有没有使用长系列的优雅方法gsub(),如:

movies_updated = gsub(' In ', ' in ', movies)
movies_updated = gsub(' In', ' in', movies_updated)
movies_updated = gsub(' Of ', ' of ', movies)
movies_updated = gsub(' Of', ' of', movies_updated)
movies_updated = gsub(' The ', ' the ', movies)
movies_updated = gsub(' the', ' the', movies_updated)
Run Code Online (Sandbox Code Playgroud)

等等.

Kon*_*rad 9

实际上,您似乎有兴趣将文本转换为标题案例.这可以通过使用包容易地实现stringi,如下所示:

>> stringi::stri_trans_totitle(c('The Kings of Summer', 'The Words', 'Out of the Furnace'))
[1] "The Kings Of Summer" "The Words"           "Out Of The Furnace"
Run Code Online (Sandbox Code Playgroud)

替代方法将涉及使用包中toTitleCase可用的功能tools:

>> tools::toTitleCase(c('The Kings of Summer', 'The Words', 'Out of the Furnace'))
[1] "The Kings of Summer" "The Words"           "Out of the Furnace" 
Run Code Online (Sandbox Code Playgroud)


r2e*_*ans 8

虽然我喜欢@Konrad对其简洁性的回答,但我会提供一种更加文字和手工的替代方案.

movies = c('The Kings Of Summer', 'The Words', 'Out Of The Furnace',
           'Me And Earl And The Dying Girl')

gr <- gregexpr("(?<!^)\\b(of|in|the)\\b", movies, ignore.case = TRUE, perl = TRUE)
mat <- regmatches(movies, gr)
regmatches(movies, gr) <- lapply(mat, tolower)
movies
# [1] "The Kings of Summer"            "The Words"                     
# [3] "Out of the Furnace"             "Me And Earl And the Dying Girl"
Run Code Online (Sandbox Code Playgroud)

正则表达式的技巧:

  • (?<!^)确保我们不匹配字符串开头的单词.如果没有这个,第一The部和第二部电影将被缩小.
  • \\b设置字边界,使得in在中间Dying不匹配.这比使用空间稍微强一些,因为连字符,逗号等不是空格,而是指示单词的开头/结尾.
  • (of|in|the)匹配中的任何一个of,inthe.可以使用分离管添加更多图案|.

一旦确定,就像用羽绒壳版本替换它们一样简单.