我有一个很长的字符串,我想将其分成固定的间隔,例如,每个间隔 10 个单词:
x <- "Hrothgar, king of the Danes, or Scyldings, builds a great mead-hall, or palace, in which he hopes to feast his liegemen and to give them presents. The joy of king and retainers is, however, of short duration. Grendel, the monster, is seized with hateful jealousy. He cannot brook the sounds of joyance that reach him down in his fen-dwelling near the hall. Oft and anon he goes to the joyous building, bent on direful mischief. Thane after thane is ruthlessly carried off and devoured, while no one is found strong enough and bold enough to cope with the monster. For twelve years he persecutes Hrothgar and his vassals."
Run Code Online (Sandbox Code Playgroud)
使用strsplitI 可以将句子拆分为单个单词:
x1 <- unlist(strsplit(x, " "))
Run Code Online (Sandbox Code Playgroud)
使用pasteI 可以将每个单词粘贴在一起 10 个单词:
paste(x1[1:10], collapse = " ")
paste(x1[11:20], collapse = " ")
...
paste(x1[101:110], collapse = " ")
Run Code Online (Sandbox Code Playgroud)
但是,这是单调乏味的,所以我试着sapply和seq:
lapply(x1, function(x) paste(x[seq(1,100,10)], collapse = " "))
Run Code Online (Sandbox Code Playgroud)
但结果不是我想要的。我想要的是这样的:
[1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
[2] "mead-hall, or palace, in which he hopes to feast his"
[3] "liegemen and to give them presents. The joy of king"
[4] "and retainers is, however, of short duration. Grendel, the monster,"
[5] "is seized with hateful jealousy. He cannot brook the sounds"
...
[10] "twelve years he persecutes Hrothgar and his vassals. NA NA"
Run Code Online (Sandbox Code Playgroud)
我对任何解决方案持开放态度,但会特别感谢一个解决方案base R。
另一个选项 with only base R,regex用于捕获 ( \\1) 10 个单词组(字母数字字符,可能包含连字符,带有一个单词绑定\b)和标点符号,并"XXX"在末尾放置一个“非凡”字符串(此处),因此可以之后由该字符串拆分(在strsplit模式中的该字符串之前放置一个空格可避免每一位末尾的尾随空格):
unlist(strsplit(gsub("(((\\w|-)+\\b[ ,.]*){10})", "\\1XXX", x), " XXX"))
# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
# [2] "mead-hall, or palace, in which he hopes to feast his"
# [3] "liegemen and to give them presents. The joy of king"
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"
# [6] "of joyance that reach him down in his fen-dwelling near"
# [7] "the hall. Oft and anon he goes to the joyous"
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"
#[10] "enough and bold enough to cope with the monster. For"
#[11] "twelve years he persecutes Hrothgar and his vassals."
Run Code Online (Sandbox Code Playgroud)
您可以创建一个序列并粘贴以下单词x1:
sapply(seq(1, length(x1), 10), function(i)
paste0(x1[i:min(i + 9, length(x1))], collapse = " "))
# [1] "Hrothgar, king of the Danes, or Scyldings, builds a great"
# [2] "mead-hall, or palace, in which he hopes to feast his"
# [3] "liegemen and to give them presents. The joy of king"
# [4] "and retainers is, however, of short duration. Grendel, the monster,"
# [5] "is seized with hateful jealousy. He cannot brook the sounds"
# [6] "of joyance that reach him down in his fen-dwelling near"
# [7] "the hall. Oft and anon he goes to the joyous"
# [8] "building, bent on direful mischief. Thane after thane is ruthlessly"
# [9] "carried off and devoured, while no one is found strong"
#[10] "enough and bold enough to cope with the monster. For"
#[11] "twelve years he persecutes Hrothgar and his vassals."
Run Code Online (Sandbox Code Playgroud)