我有一个小写的字符串向量.我想将它们改为标题案例,这意味着每个单词的第一个字母都会被大写.我已经设法用一个双循环来做,但我希望有一个更有效和优雅的方式来做到这一点,也许是一个单行gsub和一个正则表达式.
这里有一些示例数据,以及有效的双循环,其次是我尝试过的其他不起作用的东西.
strings = c("first phrase", "another phrase to convert",
"and here's another one", "last-one")
# For each string in the strings vector, find the position of each
# instance of a space followed by a letter
matches = gregexpr("\\b[a-z]+", strings)
# For each string in the strings vector, convert the first letter
# of each word to upper case
for (i in 1:length(strings)) {
# Extract the position of each regex match for the string in row i
# of the strings vector.
match.positions = matches[[i]][1:length(matches[[i]])]
# Convert the letter in each match position to upper case
for (j in 1:length(match.positions)) {
substr(strings[i], match.positions[j], match.positions[j]) =
toupper(substr(strings[i], match.positions[j], match.positions[j]))
}
}
Run Code Online (Sandbox Code Playgroud)
这很有效,但看起来非常复杂.我只是在尝试使用更直接的方法失败之后才使用它.以下是我尝试过的一些内容以及输出:
# Google search suggested \\U might work, but evidently not in R
gsub("(\\b[a-z]+)", "\\U\\1" ,strings)
[1] "Ufirst Uphrase" "Uanother Uphrase Uto Uconvert"
[3] "Uand Uhere'Us Uanother Uone" "Ulast-Uone"
# I tried this on a lark, but to no avail
gsub("(\\b[a-z]+)", toupper("\\1"), strings)
[1] "first phrase" "another phrase to convert"
[3] "and here's another one" "last-one"
Run Code Online (Sandbox Code Playgroud)
正则表达式捕获每个字符串中的正确位置,如调用所示gregexpr,但替换字符串显然不能按预期工作.
如果您还不能说,我对正则表达式相对较新,并希望获得有关如何使替换正常工作的帮助.我还想学习如何构造正则表达式以避免在撇号之后捕获一个字母,因为我不想改变这些字母的大小写.
Ben*_*ker 19
主要的问题是你缺少了perl=TRUE(你的正则表达式有点错误,虽然这可能是因为试图修复第一个问题而徘徊).
如果您的代码最终在一些奇怪的(对不起,爱沙尼亚语)语言环境中运行,而不是字母表中的最后一个字母,则使用[:lower:]而不是[a-z]稍微更安全...z
re_from <- "\\b([[:lower:]])([[:lower:]]+)"
strings <- c("first phrase", "another phrase to convert",
"and here's another one", "last-one")
gsub(re_from, "\\U\\1\\L\\2" ,strings, perl=TRUE)
## [1] "First Phrase" "Another Phrase To Convert"
## [3] "And Here's Another One" "Last-One"
Run Code Online (Sandbox Code Playgroud)
您可能更喜欢使用\\E(停止大写)而不是\\L(开始小写),具体取决于您要遵循的规则,例如:
string2 <- "using AIC for model selection"
gsub(re_from, "\\U\\1\\E\\2" ,string2, perl=TRUE)
## [1] "Using AIC For Model Selection"
Run Code Online (Sandbox Code Playgroud)
在不使用的情况下regex,帮助页面tolower有两个示例函数可以执行此操作.
更强大的版本是
capwords <- function(s, strict = FALSE) {
cap <- function(s) paste(toupper(substring(s, 1, 1)),
{s <- substring(s, 2); if(strict) tolower(s) else s},
sep = "", collapse = " " )
sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
}
capwords(c("using AIC for model selection"))
## -> [1] "Using AIC For Model Selection"
Run Code Online (Sandbox Code Playgroud)
要使你的regex方法(几乎)工作,你需要设置`perl = TRUE)
gsub("(\\b[a-z]{1})", "\\U\\1" ,strings, perl=TRUE)
[1] "First Phrase" "Another Phrase To Convert"
[3] "And Here'S Another One" "Last-One"
Run Code Online (Sandbox Code Playgroud)
但是你可能需要略微更好地处理撇号
sapply(lapply(strsplit(strings, ' '), gsub, pattern = '^([[:alnum:]]{1})', replace = '\\U\\1', perl = TRUE), paste,collapse = ' ')
Run Code Online (Sandbox Code Playgroud)
快速搜索SO发现/sf/answers/445574461/
这里已经有了很好的答案.这是一个使用报告包中的便利功能:
strings <- c("first phrase", "another phrase to convert",
"and here's another one", "last-one")
CA(strings)
## > CA(strings)
## [1] "First Phrase" "Another Phrase To Convert"
## [3] "And Here's Another One" "Last-one"
Run Code Online (Sandbox Code Playgroud)
虽然它没有大写一个,因为为我的目的这样做是没有意义的.
更新我管理qdapRegex包,它具有TC执行真正标题案例的(标题案例)函数:
TC(strings)
## [[1]]
## [1] "First Phrase"
##
## [[2]]
## [1] "Another Phrase to Convert"
##
## [[3]]
## [1] "And Here's Another One"
##
## [[4]]
## [1] "Last-One"
Run Code Online (Sandbox Code Playgroud)