我一直在尝试使用PHP和XMLReader解析一个非常大的XML文件,但似乎无法得到我正在寻找的结果.基本上,我正在搜索大量的信息,如果a包含某个zipcode,我想返回那一点XML,或继续搜索,直到找到该zipcode.从本质上讲,我将把这个大文件分解成只有几个小块,所以不必查看数千或数百万组信息,它可能是10或20.
这里有一些我喜欢的XML
//search through xml
<lineups country="USA">
//cache TX02217 as a variable
<headend headendId="TX02217">
//cache Grande Gables at The Terrace as a variable
<name>Grande Gables at The Terrace</name>
//cache Grande Communications as a variable
<mso msoId="17541">Grande Communications</mso>
<marketIds>
<marketId type="DMA">635</marketId>
</marketIds>
//check to see if any of the postal codes are equal to $pc variable that will be set in the php
<postalCodes>
<postalCode>11111</postalCode>
<postalCode>22222</postalCode>
<postalCode>33333</postalCode>
<postalCode>78746</postalCode>
</postalCodes>
//cache Austin to a variable
<location>Austin</location>
<lineup>
//cache all prgSvcID's to an array i.e. 20014, 10722
<station prgSvcId="20014">
//cache all channels to an array i.e. 002, 003
<chan effDate="2006-01-16" tier="1">002</chan>
</station>
<station prgSvcId="10722">
<chan effDate="2006-01-16" tier="1">003</chan>
</station>
</lineup>
<areasServed>
<area>
//cache community to a variable $community
<community>Thorndale</community>
<county code="45331" size="D">Milam</county>
//cache state to a variable i.e. TX
<state>TX</state>
</area>
<area>
<community>Thrall</community>
<county code="45491" size="B">Williamson</county>
<state>TX</state>
</area>
</areasServed>
</headend>
//if any of the postal codes matched $pc
//echo back the xml from <headend> to </headend>
//if none of the postal codes matched $pc
//clear variables and move to next <headend>
<headend>
etc
etc
etc
</headend>
<headend>
etc
etc
etc
</headend>
<headend>
etc
etc
etc
</headend>
</lineups>
Run Code Online (Sandbox Code Playgroud)
PHP:
<?php
$pc = "78746";
$xmlfile="myFile.xml";
$reader = new XMLReader();
$reader->open($xmlfile);
while ($reader->read()) {
//search to see if groups contain $pc and echo info
}
Run Code Online (Sandbox Code Playgroud)
我知道我正在努力使它变得比它应该更难,但我试图操纵这么大的文件有点不知所措.任何帮助表示赞赏.
为了获得更大的灵活性,XMLReader我通常创建自己的迭代器,它能够处理XMLReader对象并提供我需要的步骤.
这开始于对所有节点的简单迭代,以及可选地具有特定名称的元素上的迭代.让我们调用最后一个XMLElementIterator读取器和元素名称作为参数.
在你的场景中,我将创建一个迭代器,SimpleXMLElement为当前元素返回一个,只占用<headend>元素:
require('xmlreader-iterators.php'); // https://gist.github.com/hakre/5147685
class HeadendIterator extends XMLElementIterator {
const ELEMENT_NAME = 'headend';
public function __construct(XMLReader $reader) {
parent::__construct($reader, self::ELEMENT_NAME);
}
/**
* @return SimpleXMLElement
*/
public function current() {
return simplexml_load_string($this->reader->readOuterXml());
}
}
Run Code Online (Sandbox Code Playgroud)
配备这个迭代器,你的其余工作主要是小菜一碟.首先加载10千兆字节的文件:
$pc = "78746";
$xmlfile = '../data/lineups.xml';
$reader = new XMLReader();
$reader->open($xmlfile);
Run Code Online (Sandbox Code Playgroud)
然后检查<headend>元素是否包含信息,如果是,则显示数据/ XML:
foreach (new HeadendIterator($reader) as $headend) {
/* @var $headend SimpleXMLElement */
if (!$headend->xpath("/*/postalCodes/postalCode[. = '$pc']")) {
continue;
}
echo 'Found, name: ', $headend->name, "\n";
echo "==========================================\n";
$headend->asXML('php://stdout');
}
Run Code Online (Sandbox Code Playgroud)
这确实是你想要实现的:迭代大文档(对内存友好)直到你找到你感兴趣的元素.然后你处理具体元素,它只是XML; XMLReader::readOuterXml()这是一个很好的工具.
示例输出:
Found, name: Grande Gables at The Terrace
==========================================
<?xml version="1.0"?>
<headend headendId="TX02217">
<name>Grande Gables at The Terrace</name>
<mso msoId="17541">Grande Communications</mso>
<marketIds>
<marketId type="DMA">635</marketId>
</marketIds>
<postalCodes>
<postalCode>11111</postalCode>
<postalCode>22222</postalCode>
<postalCode>33333</postalCode>
<postalCode>78746</postalCode>
</postalCodes>
<location>Austin</location>
<lineup>
<station prgSvcId="20014">
<chan effDate="2006-01-16" tier="1">002</chan>
</station>
<station prgSvcId="10722">
<chan effDate="2006-01-16" tier="1">003</chan>
</station>
</lineup>
<areasServed>
<area>
<community>Thorndale</community>
<county code="45331" size="D">Milam</county>
<state>TX</state>
</area>
<area>
<community>Thrall</community>
<county code="45491" size="B">Williamson</county>
<state>TX</state>
</area>
</areasServed>
</headend>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
9041 次 |
| 最近记录: |