字符串解析帮助

Question

字符串解析帮助

我有一个像下面这样的字符串:

$string = "
<paragraph>apples are red...</paragraph>
<paragraph>john is a boy..</paragraph>
<paragraph>this is dummy text......</paragraph>
";

Run Code Online (Sandbox Code Playgroud)

我想将此字符串拆分为一个数组,该数组包含<paragraph></paragraph>标记之间的文本.像这样的事情:

$string = "
<paragraph>apples are red...</paragraph>
<paragraph>john is a boy..</paragraph>
<paragraph>this is dummy text......</paragraph>
";

$paragraphs = splitParagraphs($string);
/* $paragraphs now contains:
   $paragraphs[0] = apples are red...
   $paragraphs[1] = john is a boy...
   $paragraphs[1] = this is dummy text...
*/

Run Code Online (Sandbox Code Playgroud)

有任何想法吗？

PS它应该不区分大小写,<paragraph>, <PARAGRAPH>, <Paragraph>应该以同样的方式处理.

编辑:这不是XML,这里有很多东西会破坏XML的结构,因此我不能使用SimpleXML等.我需要一个正则表达式来解析它.

Answer 1

Mar*_*ers 5

如果这实际上是XML,那么我同意其他答案.但是,如果它不是有效的XML,但只是一些看起来像模糊的XML,那么你应该不尝试用XML解析器解析它.相反,您可以使用正则表达式:

$matches = array();
preg_match_all(":<paragraph>(.*?)</paragraph>:is", $string, $matches);
$result = $matches[1];
print_r($result);

Run Code Online (Sandbox Code Playgroud)

输出:

Array
(
    [0] => apples are red...
    [1] => john is a boy..
    [2] => this is dummy text......
)

Run Code Online (Sandbox Code Playgroud)

请注意,这i意味着不区分大小写,并s允许新行在文本中匹配.不在段落标记内的所有文本都将被忽略.

归档时间：	15 年，11 月前
查看次数：	297 次
最近记录：	15 年，11 月前