将字符串拆分为1-3个单词的字符串数组,具体取决于长度

Kam*_*ski 6 javascript

我有以下输入字符串

Lorem ipsum dolor坐在ame consectetur adipiscing elit sed doeiu​​smod tempor incididunt ut Duis aute irure dolor in presrenderit in esse cillum dolor eu fugia ...

通过示例拆分规则

[
     "Lorem ipsum dolor",  // A: Tree words <6 letters  
     "sit amet",           // B: Two words <6 letters if next word >6 letters
     "consectetur",        // C: One word >=6 letters if next word >=6 letters
     "adipiscing elit",    // D: Two words: first >=6, second <6 letters
     "sed doeiusmod",      // E: Two words: firs<6, second >=6 letters
     "tempor"              // rule C
     "incididunt ut"       // rule D
     "Duis aute irure"     // rule A
     "dolor in"            // rule B
     "reprehenderit in"    // rule D
     "esse cillum"         // rule E
     "dolor eu fugia"      // rule D
     ...
]
Run Code Online (Sandbox Code Playgroud)

因此,您可以看到数组中的字符串可以包含最小一和最大树形词。我尝试按以下方式进行操作,但不起作用-怎么做?

[
     "Lorem ipsum dolor",  // A: Tree words <6 letters  
     "sit amet",           // B: Two words <6 letters if next word >6 letters
     "consectetur",        // C: One word >=6 letters if next word >=6 letters
     "adipiscing elit",    // D: Two words: first >=6, second <6 letters
     "sed doeiusmod",      // E: Two words: firs<6, second >=6 letters
     "tempor"              // rule C
     "incididunt ut"       // rule D
     "Duis aute irure"     // rule A
     "dolor in"            // rule B
     "reprehenderit in"    // rule D
     "esse cillum"         // rule E
     "dolor eu fugia"      // rule D
     ...
]
Run Code Online (Sandbox Code Playgroud)

更新

边界条件:如果最后一个单词/单词不匹配任何规则,则只需将它们添加为最后一个数组元素(但两个长单词不能在一个字符串中更新)

总结和有趣的结论

对于这个问题,我们得到8个不错的答案,其中一些讨论是关于自描述(或自解释)代码的。自我描述的代码是未读问题的人在初看后就能轻松说出确切的代码功能时。可悲的是,任何答案都提供了这样的代码-因此,这个问题是一个示例,表明自我描述可能是一个神话

Cer*_*nce 5

一种选择是首先创建一组规则,例如:

const rules = [
  // [# of words to splice if all conditions met, condition for word1, condition for word2, condition for word3...]
  [3, 'less', 'less', 'less'],
  // the above means: splice 3 words if the next 3 words' lengths are <6, <6, <6
  [2, 'less', 'less', 'eqmore'],
  // the above means: splice 2 words if the next 3 words' lengths are <6, <6, >=6
  [1, 'eqmore', 'eqmore'],
  [2, 'eqmore', 'less'],
  [2, 'less', 'eqmore']
];
Run Code Online (Sandbox Code Playgroud)

然后迭代规则数组,找到匹配的规则,从匹配的规则中提取适当数量的单词进行拼接,并推送到输出数组:

const rules = [
  // [# of words to splice if all conditions met, condition for word1, condition for word2, condition for word3...]
  [3, 'less', 'less', 'less'],
  // the above means: splice 3 words if the next 3 words' lengths are <6, <6, <6
  [2, 'less', 'less', 'eqmore'],
  // the above means: splice 2 words if the next 3 words' lengths are <6, <6, >=6
  [1, 'eqmore', 'eqmore'],
  [2, 'eqmore', 'less'],
  [2, 'less', 'eqmore']
];
Run Code Online (Sandbox Code Playgroud)

当然,.find假设每个输入字符串对于每个拼接位置总是有一个匹配规则。

对于将与先前规则不匹配的任何单词添加到输出的附加规则,将其放入数组[1]的底部rules

    const rules = [
      [3, 'less', 'less', 'less'],
      [2, 'less', 'less', 'eqmore'],
      [1, 'eqmore', 'eqmore'],
      [2, 'eqmore', 'less'],
      [2, 'less', 'eqmore']
    ];
const s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";

const words = s.split(' ');
const output = [];
const verify = (cond, word) => cond === 'less' ? word.length < 6 : word.length >= 6;
while (words.length) {
  const [wordCount] = rules.find(
    ([wordCount, ...conds]) => conds.every((cond, i) => verify(cond, words[i]))
  );
  output.push(words.splice(0, wordCount).join(' '));
}
console.log(output);
Run Code Online (Sandbox Code Playgroud)


geo*_*org 5

您可以将规则表示为缩写的正则表达式,从它们中构建真实的正则表达式,并将其应用于输入:

text = "Lorem ipsum, dolor. sit amet? consectetur,   adipiscing,  elit! sed doeiusmod tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia bla?";

rules = ['(SSS)', '(SS(?=L))', '(L(?=L))', '(SL)', '(LS)', '(.+)']

regex = new RegExp(
    rules
        .join('|')
        .replace(/S/g, '\\w{1,5}\\W+')
        .replace(/L/g, '\\w{6,}\\W+')
    , 'g')

console.log(text.match(regex))
Run Code Online (Sandbox Code Playgroud)

如果规则不更改,则仅需要一次正则表达式构造部分。

注意,这也以合理的方式处理标点符号。


Bol*_*Key 5

如果我们将长度 <6 的单词定义为大小 1,将长度 >=6 的单词定义为大小 2,我们可以将规则重写为“如果下一个单词将使当前行的总大小 >= 4,则开始下一行”。

function wordSize(word) {
  if (word.length < 6) 
    return 1;
  return 2;
}
let s = "Lorem ipsum dolor sit amet consectetur adipiscing elit sed doeiusd tempor incididunt ut Duis aute irure dolor in reprehenderit in esse cillum dolor eu fugia";
var result = [];
var words = s.split(" ");
var row = [];
for (var i = 0; i < words.length; ++i) {
  if (row.reduce((s, w) => s + wordSize(w), 0) + wordSize(words[i]) >= 4) {
    result.push(row);
    row = [];
  }
  row.push(words[i]);
}
result.push(row);
result = result.map(a => a.join(" "));
console.log(result);
Run Code Online (Sandbox Code Playgroud)