使用Javascript中的正则表达式对字符串进行标记

Naw*_*waz 4 javascript regex string tokenize stringtokenizer

假设我有一个包含换行符和制表符的长字符串:

var x = "This is a long string.\n\t This is another one on next line.";
Run Code Online (Sandbox Code Playgroud)

那么我们如何使用正则表达式将此字符串拆分为标记?

我不想使用,.split(' ')因为我想学习Javascript的正则表达式.

一个更复杂的字符串可能是这样的:

var y = "This @is a #long $string. Alright, lets split this.";
Run Code Online (Sandbox Code Playgroud)

现在我只想从这个字符串中提取有效单词,没有特殊字符和标点符号,即我想要这些:

var xwords = ["This", "is", "a", "long", "string", "This", "is", "another", "one", "on", "next", "line"];

var ywords = ["This", "is", "a", "long", "string", "Alright", "lets", "split", "this"];
Run Code Online (Sandbox Code Playgroud)

Ale*_*tov 8

这是你问的一个jsfiddle例子:http://jsfiddle.net/ayezutov/BjXw5/1/

基本上,代码非常简单:

var y = "This @is a #long $string. Alright, lets split this.";
var regex = /[^\s]+/g; // This is "multiple not space characters, which should be searched not once in string"

var match = y.match(regex);
for (var i = 0; i<match.length; i++)
{
    document.write(match[i]);
    document.write('<br>');
}
Run Code Online (Sandbox Code Playgroud)

更新:基本上你可以扩展分隔符字符列表:http://jsfiddle.net/ayezutov/BjXw5/2/

var regex = /[^\s\.,!?]+/g;
Run Code Online (Sandbox Code Playgroud)

更新2: 始终只有字母:http: //jsfiddle.net/ayezutov/BjXw5/3/

var regex = /\w+/g;
Run Code Online (Sandbox Code Playgroud)