hak*_*kre 22 php regex xml domdocument
我有一些返回的XML块DOMDocument::saveXML().它已经非常缩进,每个级别有两个空格,如下所示:
<?xml version="1.0"?>
<root>
<error>
<a>eee</a>
<b>sd</b>
</error>
</root>
Run Code Online (Sandbox Code Playgroud)
由于无法配置DOMDocument(AFAIK)缩进字符,我认为可以运行正则表达式并通过将所有两个空格对替换为制表符来更改缩进.这可以通过回调函数(Demo)来完成:
$xml_string = $doc->saveXML();
function callback($m)
{
$spaces = strlen($m[0]);
$tabs = $spaces / 2;
return str_repeat("\t", $tabs);
}
$xml_string = preg_replace_callback('/^(?:[ ]{2})+/um', 'callback', $xml_string);
Run Code Online (Sandbox Code Playgroud)
我现在想知道是否有可能做这个没有回调函数(并且没有e-modifier(EVAL)).有想法的任何正则表达式向导?
Qta*_*tax 24
你可以使用\G:
preg_replace('/^ |\G /m', "\t", $string);
Run Code Online (Sandbox Code Playgroud)
做了一些基准测试并在Win32上使用PHP 5.2和5.4获得了以下结果:
>php -v
PHP 5.2.17 (cli) (built: Jan 6 2011 17:28:41)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies
>php -n test.php
XML length: 21100
Iterations: 1000
callback: 2.3627231121063
\G: 1.4221360683441
while: 3.0971200466156
/e: 7.8781840801239
>php -v
PHP 5.4.0 (cli) (built: Feb 29 2012 19:06:50)
Copyright (c) 1997-2012 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2012 Zend Technologies
>php -n test.php
XML length: 21100
Iterations: 1000
callback: 1.3771259784698
\G: 1.4414191246033
while: 2.7389969825745
/e: 5.5516891479492
Run Code Online (Sandbox Code Playgroud)
令人惊讶的是,回调比\GPHP 5.4 更快(似乎依赖于数据,\G在其他一些情况下更快).
对于\G /^ |\G /m使用,并且比它快一点/(?:^|\G) /m.
/(?>^|\G) /m甚至比/(?:^|\G) /m.
/u,/S,/X开关没有影响\G显着表现.
该while取而代之的是,如果最快的深度低(最高约4压痕,8位,在我的测试),但随后得到随着深度的增加速度较慢.
使用以下代码:
<?php
$base_iter = 1000;
$xml_string = str_repeat(<<<_STR_
<?xml version="1.0"?>
<root>
<error>
<a> eee </a>
<b> sd </b>
<c>
deep
deeper still
deepest !
</c>
</error>
</root>
_STR_
, 100);
//*** while ***
$re = '%# Match leading spaces following leading tabs.
^ # Anchor to start of line.
(\t*) # $1: Preserve any/all leading tabs.
[ ]{2} # Match "n" spaces.
%mx';
function conv_indent_while($xml_string) {
global $re;
while(preg_match($re, $xml_string))
$xml_string = preg_replace($re, "$1\t", $xml_string);
return $xml_string;
}
//*** \G ****
function conv_indent_g($string){
return preg_replace('/^ |\G /m', "\t", $string);
}
//*** callback ***
function callback($m)
{
$spaces = strlen($m[0]);
$tabs = $spaces / 2;
return str_repeat("\t", $tabs);
}
function conv_indent_callback($str){
return preg_replace_callback('/^(?:[ ]{2})+/m', 'callback', $str);
}
//*** callback /e ***
function conv_indent_e($str){
return preg_replace('/^(?: )+/me', 'str_repeat("\t", strlen("$0")/2)', $str);
}
//*** tests
function test2() {
global $base_iter;
global $xml_string;
$t = microtime(true);
for($i = 0; $i < $base_iter; ++$i){
$s = conv_indent_while($xml_string);
if(strlen($s) >= strlen($xml_string))
exit("strlen invalid 2");
}
return (microtime(true) - $t);
}
function test1() {
global $base_iter;
global $xml_string;
$t = microtime(true);
for($i = 0; $i < $base_iter; ++$i){
$s = conv_indent_g($xml_string);
if(strlen($s) >= strlen($xml_string))
exit("strlen invalid 1");
}
return (microtime(true) - $t);
}
function test0(){
global $base_iter;
global $xml_string;
$t = microtime(true);
for($i = 0; $i < $base_iter; ++$i){
$s = conv_indent_callback($xml_string);
if(strlen($s) >= strlen($xml_string))
exit("strlen invalid 0");
}
return (microtime(true) - $t);
}
function test3(){
global $base_iter;
global $xml_string;
$t = microtime(true);
for($i = 0; $i < $base_iter; ++$i){
$s = conv_indent_e($xml_string);
if(strlen($s) >= strlen($xml_string))
exit("strlen invalid 02");
}
return (microtime(true) - $t);
}
echo 'XML length: ' . strlen($xml_string) . "\n";
echo 'Iterations: ' . $base_iter . "\n";
echo 'callback: ' . test0() . "\n";
echo '\G: ' . test1() . "\n";
echo 'while: ' . test2() . "\n";
echo '/e: ' . test3() . "\n";
?>
Run Code Online (Sandbox Code Playgroud)
首先想到的是以下简单的解决方案:
$xml_string = str_replace(' ', "\t", $xml_string);
Run Code Online (Sandbox Code Playgroud)
但我假设,您希望将替换限制为仅前导空格。对于这种情况,您当前的解决方案对我来说看起来非常干净。也就是说,您可以在没有回调或e修饰符的情况下完成此操作,但您需要递归运行它才能完成工作,如下所示:
$re = '%# Match leading spaces following leading tabs.
^ # Anchor to start of line.
(\t*) # $1: Preserve any/all leading tabs.
[ ]{2} # Match "n" spaces.
%umx';
while(preg_match($re, $xml_string))
$xml_string = preg_replace($re, "$1\t", $xml_string);
Run Code Online (Sandbox Code Playgroud)
令人惊讶的是,我的测试表明这几乎是回调方法的两倍。(我的猜测恰恰相反。)
请注意,Qtax 有一个优雅的解决方案,效果很好(我给了它+1)。但是,我的基准测试显示它比原始回调方法慢。我认为这是因为表达式/(?:^|\G) /um不允许正则表达式引擎利用:“锚定在模式的开头”内部优化。RE 引擎被迫针对目标字符串中的每个位置测试模式。对于以锚点开头的模式表达式^,RE 引擎只需检查每行的开头,这使得匹配速度更快。
很好的问题!+1
我必须道歉,因为我上面所做的业绩陈述是错误的。我仅针对一个(非代表性)测试文件运行正则表达式,该文件的前导空格中大部分是制表符。当针对具有大量前导空格的更真实的文件进行测试时,我上面的递归方法的执行速度明显慢于其他两种方法。
如果有人感兴趣,这里是我用来衡量每个正则表达式性能的基准脚本:
test.php<?php // test.php 20120308_1200
require_once('inc/benchmark.inc.php');
// -------------------------------------------------------
// Test 1: Recursive method. (ridgerunner)
function tabify_leading_spaces_1($xml_string) {
$re = '%# Match leading spaces following leading tabs.
^ # Anchor to start of line.
(\t*) # $1: Any/all leading tabs.
[ ]{2} # Match "n" spaces.
%umx';
while(preg_match($re, $xml_string))
$xml_string = preg_replace($re, "$1\t", $xml_string);
return $xml_string;
}
// -------------------------------------------------------
// Test 2: Original callback method. (hakre)
function tabify_leading_spaces_2($xml_string) {
return preg_replace_callback('/^(?:[ ]{2})+/um', '_callback', $xml_string);
}
function _callback($m) {
$spaces = strlen($m[0]);
$tabs = $spaces / 2;
return str_repeat("\t", $tabs);
}
// -------------------------------------------------------
// Test 3: Qtax's elegantly simple \G method. (Qtax)
function tabify_leading_spaces_3($xml_string) {
return preg_replace('/(?:^|\G) /um', "\t", $xml_string);
}
// -------------------------------------------------------
// Verify we get the same results from all methods.
$data = file_get_contents('testdata.txt');
$data1 = tabify_leading_spaces_1($data);
$data2 = tabify_leading_spaces_2($data);
$data3 = tabify_leading_spaces_3($data);
if ($data1 == $data2 && $data2 == $data3) {
echo ("GOOD: Same results.\n");
} else {
exit("BAD: Different results.\n");
}
// Measure and print the function execution times.
$time1 = benchmark_12('tabify_leading_spaces_1', $data, 2, true);
$time2 = benchmark_12('tabify_leading_spaces_2', $data, 2, true);
$time3 = benchmark_12('tabify_leading_spaces_3', $data, 2, true);
?>
Run Code Online (Sandbox Code Playgroud)
上面的脚本使用了我不久前编写的以下方便的小基准测试函数:
benchmark.inc.php<?php // benchmark.inc.php
/*----------------------------------------------------------------------------
function benchmark_12($funcname, $p1, $reptime = 1.0, $verbose = true, $p2 = NULL) {}
By: Jeff Roberson
Created: 2010-03-17
Last edited: 2012-03-08
Discussion:
This function measures the time required to execute a given function by
calling it as many times as possible within an allowed period == $reptime.
A first pass determines a rough measurement of function execution time
by increasing the $nreps count by a factor of 10 - (i.e. 1, 10, 100, ...),
until an $nreps value is found which takes more than 0.01 secs to finish.
A second pass uses the value determined in the first pass to compute the
number of reps that can be performed within the allotted $reptime seconds.
The second pass then measures the time required to call the function the
computed number of times (which should take about $reptime seconds). The
average function execution time is then computed by dividing the total
measured elapsed time by the number of reps performed in that time, and
then all the pertinent values are returned to the caller in an array.
Note that this function is limited to measuring only those functions
having either one or two arguments that are passed by value and
not by reference. This is why the name of this function ends with "12".
Variations of this function can be easily cloned which can have more
than two parameters.
Parameters:
$funcname: String containing name of function to be measured. The
function to be measured must take one or two parameters.
$p1: First argument to be passed to $funcname function.
$reptime Target number of seconds allowed for benchmark test.
(float) (Default=1.0)
$verbose Boolean value determines if results are printed.
(bool) (Default=true)
$p2: Second (optional) argument to be passed to $funcname function.
Return value:
$result[] Array containing measured and computed values:
$result['funcname'] : $funcname - Name of function measured.
$result['msg'] : $msg - String with formatted results.
$result['nreps'] : $nreps - Number of function calls made.
$result['time_total'] : $time - Seconds to call function $nreps times.
$result['time_func'] : $t_func - Seconds to call function once.
$result['result'] : $result - Last value returned by function.
Variables:
$time: Float epoch time (secs since 1/1/1970) or benchmark elapsed secs.
$i: Integer loop counter.
$nreps Number of times function called in benchmark measurement loops.
----------------------------------------------------------------------------*/
function benchmark_12($funcname, $p1, $reptime = 1.0, $verbose = false, $p2 = NULL) {
if (!function_exists($funcname)) {
exit("\n[benchmark1] Error: function \"{$funcname}()\" does not exist.\n");
}
if (!isset($p2)) { // Case 1: function takes one parameter ($p1).
// Pass 1: Measure order of magnitude number of calls needed to exceed 10 milliseconds.
for ($time = 0.0, $n = 1; $time < 0.01; $n *= 10) { // Exponentially increase $nreps.
$time = microtime(true); // Mark start time. (sec since 1970).
for ($i = 0; $i < $n; ++$i) { // Loop $n times. ($n = 1, 10, 100...)
$result = ($funcname($p1)); // Call the function over and over...
}
$time = microtime(true) - $time; // Mark stop time. Compute elapsed secs.
$nreps = $n; // Number of reps just measured.
}
$t_func = $time / $nreps; // Function execution time in sec (rough).
// Pass 2: Measure time required to perform $nreps function calls (in about $reptime sec).
if ($t_func < $reptime) { // If pass 1 time was not pathetically slow...
$nreps = (int)($reptime / $t_func); // Figure $nreps calls to add up to $reptime.
$time = microtime(true); // Mark start time. (sec since 1970).
for ($i = 0; $i < $nreps; ++$i) { // Loop $nreps times (should take $reptime).
$result = ($funcname($p1)); // Call the function over and over...
}
$time = microtime(true) - $time; // Mark stop time. Compute elapsed secs.
$t_func = $time / $nreps; // Average function execution time in sec.
}
} else { // Case 2: function takes two parameters ($p1 and $p2).
// Pass 1: Measure order of magnitude number of calls needed to exceed 10 milliseconds.
for ($time = 0.0, $n = 1; $time < 0.01; $n *= 10) { // Exponentially increase $nreps.
$time = microtime(true); // Mark start time. (sec since 1970).
for ($i = 0; $i < $n; ++$i) { // Loop $n times. ($n = 1, 10, 100...)
$result = ($funcname($p1, $p2)); // Call the function over and over...
}
$time = microtime(true) - $time; // Mark stop time. Compute elapsed secs.
$nreps = $n; // Number of reps just measured.
}
$t_func = $time / $nreps; // Function execution time in sec (rough).
// Pass 2: Measure time required to perform $nreps function calls (in about $reptime sec).
if ($t_func < $reptime) { // If pass 1 time was not pathetically slow...
$nreps = (int)($reptime / $t_func); // Figure $nreps calls to add up to $reptime.
$time = microtime(true); // Mark start time. (sec since 1970).
for ($i = 0; $i < $nreps; ++$i) { // Loop $nreps times (should take $reptime).
$result = ($funcname($p1, $p2)); // Call the function over and over...
}
$time = microtime(true) - $time; // Mark stop time. Compute elapsed secs.
$t_func = $time / $nreps; // Average function execution time in sec.
}
}
$msg = sprintf("%s() Nreps:%7d Time:%7.3f s Function time: %.6f sec\n",
$funcname, $nreps, $time, $t_func);
if ($verbose) echo($msg);
return array('funcname' => $funcname, 'msg' => $msg, 'nreps' => $nreps,
'time_total' => $time, 'time_func' => $t_func, 'result' => $result);
}
?>
Run Code Online (Sandbox Code Playgroud)
当我test.php使用 的内容运行时benchmark.inc.php,这是我得到的结果:
GOOD: Same results.
tabify_leading_spaces_1() Nreps: 1756 Time: 2.041 s Function time: 0.001162 sec
tabify_leading_spaces_2() Nreps: 1738 Time: 1.886 s Function time: 0.001085 sec
tabify_leading_spaces_3() Nreps: 2161 Time: 2.044 s Function time: 0.000946 sec
底线:我建议使用 Qtax 的方法。
谢谢Q税!
| 归档时间: |
|
| 查看次数: |
1760 次 |
| 最近记录: |