使用Simple Html dom解析器进行Html解析

And*_*eda 2 php parsing

我使用简单的html dom解析器来解析一些html.

我有这样的HTML

<span class="UIStory_Message">
    Yeah, elixir of life!<br/>
   <a href="asdfasdf">
      <span>asdfsdfasdfsdf</span>
       <wbr/>
       <span class="word_break"/>
       61193133389&ref=nf
   </a>
</span>
Run Code Online (Sandbox Code Playgroud)

我的代码是

$storyMessageNodes    = $story->find('span.UIStory_Message');
$storyMessage         = strip_tags($storyMessageNodest->innertext);
Run Code Online (Sandbox Code Playgroud)

我想在跨度"UIStory_Message"中找到正确的文本.即,"是的,生命的灵丹妙药!".

但上面的代码给出了整个范围内的整个文本.即,"是的,生命的灵丹妙药!asdfsdfasdfsdf 61193133389&ref = nf"

我怎么能编码使它只给出"是啊,生命的灵丹妙药!" ??

rav*_*ren 5

我已经写了一个方法来摆脱获取的DOM节点中不需要的元素,我已经联系了作者,但是简单的dom已经活了两年了,所以我怀疑他会把它包含在发行版中.这里是:

/**
 * remove specified nodes from selected dom
 *
 * @param string $selector
 * @param int|array (optional) possible values include:
 *   + positive integer - remove first denoted number of elements
 *   + negative integer - remove last denoted number of elements
 *   + array of ones and zeroes - remove the respective matches that equal to one
 *
 * eg.
 *   // will remove first two images found in node
 *   $dom->removeNodes('img',2);
 *
 *   // will remove last two images found in node
 *   $dom->removeNodes('img',-2);
 *
 *   // will remove all but the third images found in node
 *   $dom->removeNodes('img',array(1,1,0,1));
 *
 * [!!!] if there are more matches found than elements in array, the last array member will be used for processing
 *
 * eg.
 *   // will remove second and every following image
 *   $dom->removeNodes('img',array(0,1));
 *
 *   // will remove only the second image
 *   $dom->removeNodes('img',array(0,1,0));
 *
 * @return simple_html_dom_node
 */
public function removeNodes($selector, $limit = NULL)
{
    $elements = $this->find($selector);
    if ( empty($elements) ) return $this;


    if ( isset($limit) && is_int( $limit ) && $limit < 0 ) {
        $limit = abs( $limit );
        $elements = array_reverse( $elements );
    }

    foreach ( $elements as $element ) {

        if ( isset($limit) ) {

            if ( is_array( $limit ) ) {
                $current = current( $limit );
                if ( next( $limit ) === FALSE ) {
                    end( $limit );
                }
                if ( !$current ) {
                    continue;
                }
            } else {
                if ( --$limit === -1 ) {
                    return $this;
                }
            }
        }

        $element->outertext = '';

    }

    return $this;
}
Run Code Online (Sandbox Code Playgroud)

把它放在simple_html_dom_node课堂上或延伸它.在askers案例中你会像这样使用它:

$storyMessageNodes = $story->find('span.UIStory_Message');
$storyMessage = $storyMessageNodes[0]->removeNodes('a')->plaintext
Run Code Online (Sandbox Code Playgroud)