如何将句子拆分为固定长度的块而不会有效地断词?

1 javascript algorithm

输入:“这个过程持续了好几年,对于聋孩子来说,一个月甚至两三年内,无数的项目和表达方式都是用最简单的日常交际方式,听力小孩子从这些不断轮换和模仿他的谈话中学习到的。在他家里听到的模拟是我的,并提出主题并唤起他自己想法的自发表达。”

CHUNK_SIZE: 200, (可以说它有 200 个字符长)。

输出:

[“这个过程持续了好几年,对于聋子来说,在一个月甚至两三年内,使用最简单的日常交际的无数项目和表达很少”,

“听到孩子从这些不断的轮换和模仿中学习,他在家里听到的对话模拟是我的,并提出话题并唤起他自己的自发表达”,

“想法。”]

我知道一种方法是计算长度并检查我是否打破了任何单词等等,但有人告诉我这是非常低效和不可取的..所以我在这里寻求帮助。

Cer*_*nce 5

一种选择是使用正则表达式来匹配 200 个字符,贪婪地,并让它回溯,直到匹配的最后一个字符后跟一个空格字符或字符串的结尾:

const str = "This process was continued for several years for the deaf child does not here in a month or even in two or three years the numberless items and expressions using the simplest daily intercourse little hearing child learns from these constant rotation and imitation the conversation he hears in his home simulates is mine and suggest topics and called forth the spontaneous expression of his own thoughts.";
const chunks = str.match(/.{1,200}(?= |$)/g);
console.log(chunks);
Run Code Online (Sandbox Code Playgroud)

如果您还想排除前导/尾随空格,请添加\S到匹配的开头和结尾:

const str = "This process was continued for several years for the deaf child does not here in a month or even in two or three years the numberless items and expressions using the simplest daily intercourse little hearing child learns from these constant rotation and imitation the conversation he hears in his home simulates is mine and suggest topics and called forth the spontaneous expression of his own thoughts.";
const chunks = str.match(/\S.{1,198}\S(?= |$)/g);
console.log(chunks);
Run Code Online (Sandbox Code Playgroud)

要使用变量:

const chunkSize = 200;
const str = "This process was continued for several years for the deaf child does not here in a month or even in two or three years the numberless items and expressions using the simplest daily intercourse little hearing child learns from these constant rotation and imitation the conversation he hears in his home simulates is mine and suggest topics and called forth the spontaneous expression of his own thoughts.";
const chunks = str.match(new RegExp(String.raw`\S.{1,${chunkSize - 2}}\S(?= |$)`, 'g'));
console.log(chunks);
Run Code Online (Sandbox Code Playgroud)

如果您还需要考虑只有一个字符的可能性,则不需要在模式中匹配两个或更多字符:

const chunkSize = 200;
const str = "This process was continued for several years for the deaf child does not here in a month or even in two or three years the numberless items and expressions using the simplest daily intercourse little hearing child learns from these constant rotation and imitation the conversation he hears in his home simulates is mine and suggest topics and called forth the spontaneous expression of his own thoughts.";
const chunks = str.match(new RegExp(String.raw`\S(?:.{0,${chunkSize - 2}}\S)?(?= |$)`, 'g'));
console.log(chunks);
Run Code Online (Sandbox Code Playgroud)