我需要将某些单词转换为小写.我正在处理电影片名列表,如果它们不是标题中的第一个单词,那么介词和文章通常是小写的.如果我有矢量:
movies = c('The Kings Of Summer', 'The Words', 'Out Of The Furnace', 'Me And Earl And The Dying Girl')
我需要的是这个:
movies_updated = c('The Kings of Summer', 'The Words', 'Out of the Furnace', 'Me and Earl and the Dying Girl')
有没有使用长系列的优雅方法gsub(),如:
movies_updated = gsub(' In ', ' in ', movies)
movies_updated = gsub(' In', ' in', movies_updated)
movies_updated = gsub(' Of ', ' of ', movies)
movies_updated = gsub(' Of', ' of', movies_updated)
movies_updated = gsub(' The ', ' the ', movies)
movies_updated = gsub(' the', ' the', movies_updated)
Run Code Online (Sandbox Code Playgroud)
等等.
实际上,您似乎有兴趣将文本转换为标题案例.这可以通过使用包容易地实现stringi,如下所示:
>> stringi::stri_trans_totitle(c('The Kings of Summer', 'The Words', 'Out of the Furnace'))
[1] "The Kings Of Summer" "The Words" "Out Of The Furnace"
Run Code Online (Sandbox Code Playgroud)
替代方法将涉及使用包中toTitleCase可用的功能tools:
>> tools::toTitleCase(c('The Kings of Summer', 'The Words', 'Out of the Furnace'))
[1] "The Kings of Summer" "The Words" "Out of the Furnace"
Run Code Online (Sandbox Code Playgroud)
虽然我喜欢@Konrad对其简洁性的回答,但我会提供一种更加文字和手工的替代方案.
movies = c('The Kings Of Summer', 'The Words', 'Out Of The Furnace',
'Me And Earl And The Dying Girl')
gr <- gregexpr("(?<!^)\\b(of|in|the)\\b", movies, ignore.case = TRUE, perl = TRUE)
mat <- regmatches(movies, gr)
regmatches(movies, gr) <- lapply(mat, tolower)
movies
# [1] "The Kings of Summer" "The Words"
# [3] "Out of the Furnace" "Me And Earl And the Dying Girl"
Run Code Online (Sandbox Code Playgroud)
正则表达式的技巧:
(?<!^)确保我们不匹配字符串开头的单词.如果没有这个,第一The部和第二部电影将被缩小.\\b设置字边界,使得in在中间Dying不匹配.这比使用空间稍微强一些,因为连字符,逗号等不是空格,而是指示单词的开头/结尾.(of|in|the)匹配中的任何一个of,in或the.可以使用分离管添加更多图案|.一旦确定,就像用羽绒壳版本替换它们一样简单.