Var*_*wda 1 xml text-processing
我的文件包含无法明确识别的数据。像这样说:
<?xml version="1.0" encoding="UTF-8" ?><ns0:collection
xmlns:ns0="http://namspace/Service/1.0"><Record>
.
.</Record></ns0:collection>
Run Code Online (Sandbox Code Playgroud)
我必须将 N 个此类文件合并并创建一个文件。所以我需要完成以下工作:
</ns0:collection>
第一个文件中删除结束标签<?xml version="1.0" encoding="UTF-8" ?><ns0:collection xmlns:ns0="http://namspace/Service/1.0">
和 </ns0:collection>
<?xml version="1.0" encoding="UTF-8" ?><ns0:collection xmlns:ns0="http://namspace/Service/1.0">
最后一个文件并将它们全部合并在一起我尝试使用sed
命令处理第一个文件,但没有产生任何结果,“merged.xml”为空。
sed '/<\/ns0:collection>/d' $file1 > merged.xml
Run Code Online (Sandbox Code Playgroud)
有什么建议么?
You didn't specify that you could only use sed
, so if you have access to xml_grep
(see Merge multiple XML files from commend line, second answer), I would recommend that because it does a lot of the heavy work for you and for a simple merge job like this can be done in one command:
xml_grep --cond Record --wrap "ns0:collection" --descr 'xmlns:ns0="http://namespace/Service/1.0"' --encoding "UTF-8" *.xml
Run Code Online (Sandbox Code Playgroud)
Test files:
test.xml
<?xml version="1.0" encoding="UTF-8" ?><ns0:collection
xmlns:ns0="http://namespace/Service/1.0""><Record>
Test
</Record></ns0:collection>
Run Code Online (Sandbox Code Playgroud)
test1.xml
<?xml version="1.0" encoding="UTF-8" ?><ns0:collection
xmlns:ns0="http://namespace/Service/1.0"><Record>
Test 1<a>a</a><b c="c">d</b>
</Record></ns0:collection>
Run Code Online (Sandbox Code Playgroud)
Result
<?xml version="1.0" encoding="UTF-8" ?>
<ns0:collection xmlns:ns0="http://namespace/Service/1.0">
<Record>
Test 1<a>a</a><b c="c">d</b></Record><Record>
Test
</Record>
</ns0:collection>
Run Code Online (Sandbox Code Playgroud)
I prefer to use XML-aware tools when dealing with XML files, because the chance of messing up the structure with sed
and friends is quite high and you can easily end up with a malformed XML document!