拆分包括正则表达式匹配

Ton*_*Nam 1 javascript regex parsing

我正在使用JavaScript解析一些文本.假设我有一些字符串:

"hello wold <1> this is some random text <3> foo <12>"
Run Code Online (Sandbox Code Playgroud)

我需要将以下子字符串放在一个数组中:

myArray[0] = "hello world ";
myArray[1] = "<1>";
myArray[2] = " this is some random text ";
myArray[3] = "<3>";
myArray[4] = " foo ";
myArray[5] = "<12>";
Run Code Online (Sandbox Code Playgroud)

请注意,每当遇到<"number">序列时,我都会分割字符串

我尝试用常规表达式拆分字符串 /<\d{1,3}>/但是当我这样做时,我松开了<"number">序列.换句话说,我最终得到了"hellow world","这是一些随机文本","foo".请注意,我松开字符串"<1>","<3>"和"<12>"我想保留它.我怎么能解决这个问题?

小智 11

您需要捕获序列以保留它.

var str = "hello wold <1> this is some random text <3> foo <12>"

str.split(/(<\d{1,3}>)/);

// ["hello wold ", "<1>", " this is some random text ", "<3>", " foo ", "<12>", ""]
Run Code Online (Sandbox Code Playgroud)

如果某些浏览器中的捕获组存在问题,您可以手动执行此操作:

var str = "hello wold <1> this is some random text <3> foo <12>",    
    re = /<\d{1,3}>/g,
    result = [],
    match,
    last_idx = 0;

while( match = re.exec( str ) ) {
   result.push( str.slice( last_idx, re.lastIndex - match[0].length ), match[0] );

   last_idx = re.lastIndex;
}
result.push( str.slice( last_idx ) );
Run Code Online (Sandbox Code Playgroud)

  • 请注意,根据[MDN](https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/Split#Description),并非所有浏览器都支持使用`.split()`捕获模式(当然它不说哪些不是). (2认同)