我有一个XML文档文件.该文件的一部分如下所示:
-<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
-<attrdomv>
-<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
Run Code Online (Sandbox Code Playgroud)
从这个XML文件中,我想创建一个包含attrlabl,attrdef,attrtype和attrdomv列的R数据框.请注意,attrdomv列应包含category变量的所有级别.数据框应如下所示:
attrlabl attrdef attrtype attrdomv
COUNTY County abbreviation Text C Clackamas County; M Multnomah County; W Washington County
Run Code Online (Sandbox Code Playgroud)
我有一个不完整的代码,如下所示:
doc <- xmlParse("taxlots.shp.xml")
dataDictionary <- xmlToDataFrame(getNodeSet(doc,"//attrlabl"))
Run Code Online (Sandbox Code Playgroud)
你能完成我的R代码吗?我感谢任何帮助!
pla*_*pus 10
假设这是正确的taxlots.shp.xml文件:
<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
<attrdomv>
<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
Run Code Online (Sandbox Code Playgroud)
你几乎在那里:
doc <- xmlParse("taxlots.shp.xml")
xmlToDataFrame(nodes=getNodeSet(doc1,"//attr"))[c("attrlabl","attrdef","attrtype","attrdomv")]
attrlabl attrdef attrtype attrdomv
1 COUNTY County abbreviation Text CClackamas CountyMMultnomah CountyWWashington County
Run Code Online (Sandbox Code Playgroud)
但是最后一个字段没有你想要的格式.为此,需要一些额外的步骤:
step1 <- xmlToDataFrame(nodes=getNodeSet(doc1,"//attrdomv/edom"))
step1
edomv edomvd edomvds
1 C Clackamas County
2 M Multnomah County
3 W Washington County
step2 <- paste(paste(step1$edomv, step1$edomvd, sep=" "), collapse="; ")
step2
[1] "C Clackamas County; M Multnomah County; W Washington County"
cbind(xmlToDataFrame(nodes= getNodeSet(doc1, "//attr"))[c("attrlabl", "attrdef", "attrtype")],
attrdomv= step2)
attrlabl attrdef attrtype attrdomv
1 COUNTY County abbreviation Text C Clackamas County; M Multnomah County; W Washington County
Run Code Online (Sandbox Code Playgroud)