如何在R中的字符串中获取前10个单词?

use*_*390 7 csv r

我在R中有一个字符串

x <- "The length of the word is going to be of nice use to me"
Run Code Online (Sandbox Code Playgroud)

我想要上面指定字符串的前10个单词.

另外,例如我有一个CSV文件,其格式如下: -

Keyword,City(Column Header)
The length of the string should not be more than 10,New York
The Keyword should be of specific length,Los Angeles
This is an experimental basis program string,Seattle
Please help me with getting only the first ten words,Boston
Run Code Online (Sandbox Code Playgroud)

我想从每行的"关键字"列中仅获取前10个单词,并将其写入CSV文件.在这方面请帮助我.

Blu*_*ter 18

正则表达式(正则表达式)回答使用\w(单词字符)及其否定\W:

gsub("^((\\w+\\W+){9}\\w+).*$","\\1",x)
Run Code Online (Sandbox Code Playgroud)
  1. ^ 令牌的开头(零宽度)
  2. ((\\w+\\W+){9}\\w+) 由非单词分隔的十个单词.
    1. (\\w+\\W+){9} 一个单词后跟非单词,9次
      1. \\w+ 一个或多个单词字符(即单词)
      2. \\W+ 一个或多个非单词字符(即空格)
      3. {9} 九次重复
    2. \\w+ 第十个字
  3. .* 其他任何东西,包括其他后续文字
  4. $ 令牌结束(零宽度)
  5. \\1 找到此标记后,将其替换为第一个捕获的组(10个单词)

  • 但是将正则表达式更改为"^((\\ w + \\ W +){0,9} \\ w +).*"`修复了这个问题. (2认同)

Jub*_*les 7

如何使用wordHadley Wickham的stringr包装功能?

word(string = x, start = 1, end = 10, sep = fixed(" "))


mar*_*bel 5

这是一个小函数,它取消列出字符串,对前十个单词进行子集化,然后将其粘贴回一起。

string_fun <- function(x) {
  ul = unlist(strsplit(x, split = "\\s+"))[1:10]
  paste(ul,collapse=" ")
}

string_fun(x)

df <- read.table(text = "Keyword,City(Column Header)
The length of the string should not be more than 10 is or are in,New York
The Keyword should be of specific length is or are in,Los Angeles
                 This is an experimental basis program string is or are in,Seattle
                 Please help me with getting only the first ten words is or are in,Boston", sep = ",", header = TRUE)

df <- as.data.frame(df)
Run Code Online (Sandbox Code Playgroud)

使用apply(该函数在第二列中没有做任何事情)

df$Keyword <- apply(df[,1:2], 1, string_fun)
Run Code Online (Sandbox Code Playgroud)

编辑 这可能是使用该功能的更通用的方法。

df[,1] <- as.character(df[,1])
df$Keyword <- unlist(lapply(df[,1], string_fun))

print(df)
#                      Keyword                            City.Column.Header.
# 1    The length of the string should not be more than            New York
# 2  The Keyword should be of specific length is or are         Los Angeles
# 3  This is an experimental basis program string is or             Seattle
# 4      Please help me with getting only the first ten              Boston
Run Code Online (Sandbox Code Playgroud)