Nac*_*cho 5 html php xml encoding utf-8
我有一个大的XML(> 15Mb),我必须阅读它,解析它,并在数据库中存储一些值.我的问题是,XML以不同的格式(UTF-8,ISO-8859-1)出现.
用UTF-8没问题.但是ISO-8859-1给了我很大的问题!标签带有特殊的字符,XMLReader和readOuterXML()无法正确解析
尝试过,但没有运气
$xml = new XMLReader;
$xml->open($import_file,'ISO-8859-1');
Run Code Online (Sandbox Code Playgroud)
试过:
XML(简化)
XML:
<?xml version="1.0" encoding="ISO-8859-1"?>
<data>
<id><![CDATA[5531]]></id>
<baños><![CDATA[0]]></baños>
</data>
Run Code Online (Sandbox Code Playgroud)
他们都没有帮助我.
Wis*_*Kik -1
您可以尝试先读取XML文件,然后转换特殊字符,然后使用XMLReader读取XML字符串。
\n\n这是代码:
\n\n<?php\nheader("Content-Type: text/plain; charset=ISO-8859-1");\nfunction normalizeChars($s){\n $replace = array(\n \'&\' => \'and\', \'@\' => \'at\', \'\xc2\xa9\' => \'c\', \'\xc2\xae\' => \'r\', \'\xc3\x80\' => \'a\',\n \'\xc3\x81\' => \'a\', \'\xc3\x82\' => \'a\', \'\xc3\x84\' => \'a\', \'\xc3\x85\' => \'a\', \'\xc3\x86\' => \'ae\',\'\xc3\x87\' => \'c\',\n \'\xc3\x88\' => \'e\', \'\xc3\x89\' => \'e\', \'\xc3\x8b\' => \'e\', \'\xc3\x8c\' => \'i\', \'\xc3\x8d\' => \'i\', \'\xc3\x8e\' => \'i\',\n \'\xc3\x8f\' => \'i\', \'\xc3\x92\' => \'o\', \'\xc3\x93\' => \'o\', \'\xc3\x94\' => \'o\', \'\xc3\x95\' => \'o\', \'\xc3\x96\' => \'o\',\n \'\xc3\x98\' => \'o\', \'\xc3\x99\' => \'u\', \'\xc3\x9a\' => \'u\', \'\xc3\x9b\' => \'u\', \'\xc3\x9c\' => \'u\', \'\xc3\x9d\' => \'y\',\n \'\xc3\x9f\' => \'ss\',\'\xc3\xa0\' => \'a\', \'\xc3\xa1\' => \'a\', \'\xc3\xa2\' => \'a\', \'\xc3\xa4\' => \'a\', \'\xc3\xa5\' => \'a\',\n \'\xc3\xa6\' => \'ae\',\'\xc3\xa7\' => \'c\', \'\xc3\xa8\' => \'e\', \'\xc3\xa9\' => \'e\', \'\xc3\xaa\' => \'e\', \'\xc3\xab\' => \'e\',\n \'\xc3\xac\' => \'i\', \'\xc3\xad\' => \'i\', \'\xc3\xae\' => \'i\', \'\xc3\xaf\' => \'i\', \'\xc3\xb2\' => \'o\', \'\xc3\xb3\' => \'o\',\n \'\xc3\xb4\' => \'o\', \'\xc3\xb5\' => \'o\', \'\xc3\xb6\' => \'o\', \'\xc3\xb8\' => \'o\', \'\xc3\xb9\' => \'u\', \'\xc3\xba\' => \'u\',\n \'\xc3\xbb\' => \'u\', \'\xc3\xbc\' => \'u\', \'\xc3\xbd\' => \'y\', \'\xc3\xbe\' => \'p\', \'\xc3\xbf\' => \'y\', \'\xc4\x80\' => \'a\',\n \'\xc4\x81\' => \'a\', \'\xc4\x82\' => \'a\', \'\xc4\x83\' => \'a\', \'\xc4\x84\' => \'a\', \'\xc4\x85\' => \'a\', \'\xc4\x86\' => \'c\',\n \'\xc4\x87\' => \'c\', \'\xc4\x88\' => \'c\', \'\xc4\x89\' => \'c\', \'\xc4\x8a\' => \'c\', \'\xc4\x8b\' => \'c\', \'\xc4\x8c\' => \'c\',\n \'\xc4\x8d\' => \'c\', \'\xc4\x8e\' => \'d\', \'\xc4\x8f\' => \'d\', \'\xc4\x90\' => \'d\', \'\xc4\x91\' => \'d\', \'\xc4\x92\' => \'e\',\n \'\xc4\x93\' => \'e\', \'\xc4\x94\' => \'e\', \'\xc4\x95\' => \'e\', \'\xc4\x96\' => \'e\', \'\xc4\x97\' => \'e\', \'\xc4\x98\' => \'e\',\n \'\xc4\x99\' => \'e\', \'\xc4\x9a\' => \'e\', \'\xc4\x9b\' => \'e\', \'\xc4\x9c\' => \'g\', \'\xc4\x9d\' => \'g\', \'\xc4\x9e\' => \'g\',\n \'\xc4\x9f\' => \'g\', \'\xc4\xa0\' => \'g\', \'\xc4\xa1\' => \'g\', \'\xc4\xa2\' => \'g\', \'\xc4\xa3\' => \'g\', \'\xc4\xa4\' => \'h\',\n \'\xc4\xa5\' => \'h\', \'\xc4\xa6\' => \'h\', \'\xc4\xa7\' => \'h\', \'\xc4\xa8\' => \'i\', \'\xc4\xa9\' => \'i\', \'\xc4\xaa\' => \'i\',\n \'\xc4\xab\' => \'i\', \'\xc4\xac\' => \'i\', \'\xc4\xad\' => \'i\', \'\xc4\xae\' => \'i\', \'\xc4\xaf\' => \'i\', \'\xc4\xb0\' => \'i\',\n \'\xc4\xb1\' => \'i\', \'\xc4\xb2\' => \'ij\',\'\xc4\xb3\' => \'ij\',\'\xc4\xb4\' => \'j\', \'\xc4\xb5\' => \'j\', \'\xc4\xb6\' => \'k\',\n \'\xc4\xb7\' => \'k\', \'\xc4\xb8\' => \'k\', \'\xc4\xb9\' => \'l\', \'\xc4\xba\' => \'l\', \'\xc4\xbb\' => \'l\', \'\xc4\xbc\' => \'l\',\n \'\xc4\xbd\' => \'l\', \'\xc4\xbe\' => \'l\', \'\xc4\xbf\' => \'l\', \'\xc5\x80\' => \'l\', \'\xc5\x81\' => \'l\', \'\xc5\x82\' => \'l\',\n \'\xc5\x83\' => \'n\', \'\xc5\x84\' => \'n\', \'\xc5\x85\' => \'n\', \'\xc5\x86\' => \'n\', \'\xc5\x87\' => \'n\', \'\xc5\x88\' => \'n\',\n \'\xc5\x89\' => \'n\', \'\xc5\x8a\' => \'n\', \'\xc5\x8b\' => \'n\', \'\xc5\x8c\' => \'o\', \'\xc5\x8d\' => \'o\', \'\xc5\x8e\' => \'o\',\n \'\xc5\x8f\' => \'o\', \'\xc5\x90\' => \'o\', \'\xc5\x91\' => \'o\', \'\xc5\x92\' => \'oe\',\'\xc5\x93\' => \'oe\',\'\xc5\x94\' => \'r\',\n \'\xc5\x95\' => \'r\', \'\xc5\x96\' => \'r\', \'\xc5\x97\' => \'r\', \'\xc5\x98\' => \'r\', \'\xc5\x99\' => \'r\', \'\xc5\x9a\' => \'s\',\n \'\xc5\x9b\' => \'s\', \'\xc5\x9c\' => \'s\', \'\xc5\x9d\' => \'s\', \'\xc5\x9e\' => \'s\', \'\xc5\x9f\' => \'s\', \'\xc5\xa0\' => \'s\',\n \'\xc5\xa1\' => \'s\', \'\xc5\xa2\' => \'t\', \'\xc5\xa3\' => \'t\', \'\xc5\xa4\' => \'t\', \'\xc5\xa5\' => \'t\', \'\xc5\xa6\' => \'t\',\n \'\xc5\xa7\' => \'t\', \'\xc5\xa8\' => \'u\', \'\xc5\xa9\' => \'u\', \'\xc5\xaa\' => \'u\', \'\xc5\xab\' => \'u\', \'\xc5\xac\' => \'u\',\n \'\xc5\xad\' => \'u\', \'\xc5\xae\' => \'u\', \'\xc5\xaf\' => \'u\', \'\xc5\xb0\' => \'u\', \'\xc5\xb1\' => \'u\', \'\xc5\xb2\' => \'u\',\n \'\xc5\xb3\' => \'u\', \'\xc5\xb4\' => \'w\', \'\xc5\xb5\' => \'w\', \'\xc5\xb6\' => \'y\', \'\xc5\xb7\' => \'y\', \'\xc5\xb8\' => \'y\',\n \'\xc5\xb9\' => \'z\', \'\xc5\xba\' => \'z\', \'\xc5\xbb\' => \'z\', \'\xc5\xbc\' => \'z\', \'\xc5\xbd\' => \'z\', \'\xc5\xbe\' => \'z\',\n \'\xc5\xbf\' => \'z\', \'\xc6\x8f\' => \'e\', \'\xc6\x92\' => \'f\', \'\xc6\xa0\' => \'o\', \'\xc6\xa1\' => \'o\', \'\xc6\xaf\' => \'u\',\n \'\xc6\xb0\' => \'u\', \'\xc7\x8d\' => \'a\', \'\xc7\x8e\' => \'a\', \'\xc7\x8f\' => \'i\', \'\xc7\x90\' => \'i\', \'\xc7\x91\' => \'o\',\n \'\xc7\x92\' => \'o\', \'\xc7\x93\' => \'u\', \'\xc7\x94\' => \'u\', \'\xc7\x95\' => \'u\', \'\xc7\x96\' => \'u\', \'\xc7\x97\' => \'u\',\n \'\xc7\x98\' => \'u\', \'\xc7\x99\' => \'u\', \'\xc7\x9a\' => \'u\', \'\xc7\x9b\' => \'u\', \'\xc7\x9c\' => \'u\', \'\xc7\xba\' => \'a\',\n \'\xc7\xbb\' => \'a\', \'\xc7\xbc\' => \'ae\',\'\xc7\xbd\' => \'ae\',\'\xc7\xbe\' => \'o\', \'\xc7\xbf\' => \'o\', \'\xc9\x99\' => \'e\',\n \'\xd0\x81\' => \'jo\',\'\xd0\x84\' => \'e\', \'\xd0\x86\' => \'i\', \'\xd0\x87\' => \'i\', \'\xd0\x90\' => \'a\', \'\xd0\x91\' => \'b\',\n \'\xd0\x92\' => \'v\', \'\xd0\x93\' => \'g\', \'\xd0\x94\' => \'d\', \'\xd0\x95\' => \'e\', \'\xd0\x96\' => \'zh\',\'\xd0\x97\' => \'z\',\n \'\xd0\x98\' => \'i\', \'\xd0\x99\' => \'j\', \'\xd0\x9a\' => \'k\', \'\xd0\x9b\' => \'l\', \'\xd0\x9c\' => \'m\', \'\xd0\x9d\' => \'n\',\n \'\xd0\x9e\' => \'o\', \'\xd0\x9f\' => \'p\', \'\xd0\xa0\' => \'r\', \'\xd0\xa1\' => \'s\', \'\xd0\xa2\' => \'t\', \'\xd0\xa3\' => \'u\',\n \'\xd0\xa4\' => \'f\', \'\xd0\xa5\' => \'h\', \'\xd0\xa6\' => \'c\', \'\xd0\xa7\' => \'ch\',\'\xd0\xa8\' => \'sh\',\'\xd0\xa9\' => \'sch\',\n \'\xd0\xaa\' => \'-\', \'\xd0\xab\' => \'y\', \'\xd0\xac\' => \'-\', \'\xd0\xad\' => \'je\',\'\xd0\xae\' => \'ju\',\'\xd0\xaf\' => \'ja\',\n \'\xd0\xb0\' => \'a\', \'\xd0\xb1\' => \'b\', \'\xd0\xb2\' => \'v\', \'\xd0\xb3\' => \'g\', \'\xd0\xb4\' => \'d\', \'\xd0\xb5\' => \'e\',\n \'\xd0\xb6\' => \'zh\',\'\xd0\xb7\' => \'z\', \'\xd0\xb8\' => \'i\', \'\xd0\xb9\' => \'j\', \'\xd0\xba\' => \'k\', \'\xd0\xbb\' => \'l\',\n \'\xd0\xbc\' => \'m\', \'\xd0\xbd\' => \'n\', \'\xd0\xbe\' => \'o\', \'\xd0\xbf\' => \'p\', \'\xd1\x80\' => \'r\', \'\xd1\x81\' => \'s\',\n \'\xd1\x82\' => \'t\', \'\xd1\x83\' => \'u\', \'\xd1\x84\' => \'f\', \'\xd1\x85\' => \'h\', \'\xd1\x86\' => \'c\', \'\xd1\x87\' => \'ch\',\n \'\xd1\x88\' => \'sh\',\'\xd1\x89\' => \'sch\',\'\xd1\x8a\' => \'-\',\'\xd1\x8b\' => \'y\', \'\xd1\x8c\' => \'-\', \'\xd1\x8d\' => \'je\',\n \'\xd1\x8e\' => \'ju\',\'\xd1\x8f\' => \'ja\',\'\xd1\x91\' => \'jo\',\'\xd1\x94\' => \'e\', \'\xd1\x96\' => \'i\', \'\xd1\x97\' => \'i\',\n \'\xd2\x90\' => \'g\', \'\xd2\x91\' => \'g\', \'\xd7\x90\' => \'a\', \'\xd7\x91\' => \'b\', \'\xd7\x92\' => \'g\', \'\xd7\x93\' => \'d\',\n \'\xd7\x94\' => \'h\', \'\xd7\x95\' => \'v\', \'\xd7\x96\' => \'z\', \'\xd7\x97\' => \'h\', \'\xd7\x98\' => \'t\', \'\xd7\x99\' => \'i\',\n \'\xd7\x9a\' => \'k\', \'\xd7\x9b\' => \'k\', \'\xd7\x9c\' => \'l\', \'\xd7\x9d\' => \'m\', \'\xd7\x9e\' => \'m\', \'\xd7\x9f\' => \'n\',\n \'\xd7\xa0\' => \'n\', \'\xd7\xa1\' => \'s\', \'\xd7\xa2\' => \'e\', \'\xd7\xa3\' => \'p\', \'\xd7\xa4\' => \'p\', \'\xd7\xa5\' => \'C\',\n \'\xd7\xa6\' => \'c\', \'\xd7\xa7\' => \'q\', \'\xd7\xa8\' => \'r\', \'\xd7\xa9\' => \'w\', \'\xd7\xaa\' => \'t\', \'\xe2\x84\xa2\' => \'tm\',\n \'\xc3\xb1\' => \'n\',\n );\n return strtr($s, $replace);\n}\n\n$path_to_file = \'\';\n$xml_text = @file_get_contents($path_to_file);\nif(!empty($xml_text)){\n $xml_text = normalizeChars($xml_text);\n $xml = new XMLReader();\n $xml->XML($xml_text);\n}\n?>\nRun Code Online (Sandbox Code Playgroud)\n\n另一方面,如果您正在寻找性能,那么您应该尝试 SimpleXML 和 DOM Document,如以下 StackOverflow 问题中所述:/sf/answers/128472711/
\n\n编辑:
\n\n我添加是header("Content-Type: text/plain; charset=ISO-8859-1")因为strtr仅适用于 ISO-8859-1。\n我使用 OP 提供的 XML 字符串进行了尝试,并且运行良好。如果有任何缺失的字符,请随时将其添加到数组中。
| 归档时间: |
|
| 查看次数: |
2269 次 |
| 最近记录: |