Jak*_*ake 4 javascript regex arrays jquery object
所以,我是编程新手,但我正在尝试学习 JavaScript。目前我工作的一个项目中,我试图解析大文本文件(莎士比亚的154首十四行诗发现这里)到一个对象数组,在数据结构如下:
var obj = {
property 1: [ 'value 1',
'value 2',
],
property 2: [ 'value 1',
'value 2',
],
Run Code Online (Sandbox Code Playgroud)
等等,其中罗马数字代表对象属性,十四行诗的每一行代表每个属性数组中的一个值。
我必须使用正则表达式来解析文本文件。到目前为止,我一直在寻找正确的正则表达式来划分文本,但我不知道我是否以正确的方式处理这个问题。最终我想创建一个下拉菜单,其中列表中的每个值都是一个十四行诗。
编辑:我实际上现在从这个 url 获取源文本:http : //pizzaboys.biz/xxx/sonnets.php
并执行与上述相同的操作,但我没有执行 $get 我已将文本放入变量中...
我试过这个:
$(document).ready(function(){
var data = new SonnetizerArray();
});
function SonnetizerArray(){
this.data = [];
var rawText = "text from above link"
var rx = /^\\n[CDILVX]/$\\n/g;
var array_of_sonnets = rawText.exec(rx);
for (var i = 0; i < array_of_sonnets.length; i ++){
var s = $.split(array_of_sonnets[i]);
if (s.length > 0) this.data.push(s);
}
}
Run Code Online (Sandbox Code Playgroud)
此正则表达式会将文本解析为罗马数字和正文。然后可以在新行上拆分正文\n。
^\s+\b([CDMLXVI]{1,12})\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)

第 0 组获取整个匹配部分
从您的链接中提取的示例文本
VII
Lo! in the orient when the gracious light
Lifts up his burning head, each under eye
Doth homage to his new-appearing sight,
VIII
Music to hear, why hear'st thou music sadly?
Sweets with sweets war not, joy delights in joy:
Why lov'st thou that which thou receiv'st not gladly,
Or else receiv'st with pleasure thine annoy?
IX
Is it for fear to wet a widow's eye,
That thou consum'st thy self in single life?
Ah! if thou issueless shalt hap to die,
The world will wail thee like a makeless wife;
Run Code Online (Sandbox Code Playgroud)
示例代码
<script type="text/javascript">
var re = /^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)/;
var sourcestring = "source string to match with pattern";
var results = [];
var i = 0;
for (var matches = re.exec(sourcestring); matches != null; matches = re.exec(sourcestring)) {
results[i] = matches;
for (var j=0; j<matches.length; j++) {
alert("results["+i+"]["+j+"] = " + results[i][j]);
}
i++;
}
</script>
Run Code Online (Sandbox Code Playgroud)
样本输出
$matches Array:
(
[0] => Array
(
[0] => VII
Lo! in the orient when the gracious light
Lifts up his burning head, each under eye
Doth homage to his new-appearing sight,
[1] =>
VIII
Music to hear, why hear'st thou music sadly?
Sweets with sweets war not, joy delights in joy:
Why lov'st thou that which thou receiv'st not gladly,
Or else receiv'st with pleasure thine annoy?
[2] =>
IX
Is it for fear to wet a widow's eye,
That thou consum'st thy self in single life?
Ah! if thou issueless shalt hap to die,
The world will wail thee like a makeless wife;
)
[1] => Array
(
[0] => VII
[1] => VIII
[2] => IX
)
[2] => Array
(
[0] =>
Lo! in the orient when the gracious light
Lifts up his burning head, each under eye
Doth homage to his new-appearing sight,
[1] =>
Music to hear, why hear'st thou music sadly?
Sweets with sweets war not, joy delights in joy:
Why lov'st thou that which thou receiv'st not gladly,
Or else receiv'st with pleasure thine annoy?
[2] =>
Is it for fear to wet a widow's eye,
That thou consum'st thy self in single life?
Ah! if thou issueless shalt hap to die,
The world will wail thee like a makeless wife;
)
[3] => Array
(
[0] => VIII
[1] => IX
[2] =>
)
)
Run Code Online (Sandbox Code Playgroud)
上面的表达式只测试罗马数字字符串是否由罗马数字字符组成,实际上并没有验证数字是否有效。如果您也需要验证罗马数字的格式是否正确,那么您可以使用此表达式。
^\s+\b(M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)
