我有一些形状像这样的数据:
<people>
<person first="Mary" last="Jane" sex="F" />
<person first="Susan" last="Smith" sex="F" height="168" />
<person last="Black" first="Joseph" sex="M" />
<person first="Jessica" last="Jones" sex="F" />
</people>
Run Code Online (Sandbox Code Playgroud)
我想要一个如下所示的数据框:
first last sex height
1 Mary Jane F NA
2 Susan Smith F 168
3 Joseph Black M NA
4 Jessica Jones F NA
Run Code Online (Sandbox Code Playgroud)
我到目前为止:
library(XML)
xpeople <- xmlRoot(xmlParse(xml))
lst <- xmlApply(xpeople, xmlAttrs)
names(lst) <- 1:length(lst)
Run Code Online (Sandbox Code Playgroud)
但我不能为我的生活弄清楚如何将列表放入数据框.我可以将列表设置为"正方形"(即填补空白),然后将其放入数据框中:
lst <- xmlApply(xpeople, function(node) {
attrs = xmlAttrs(node)
if (!("height" %in% names(attrs))) {
attrs[["height"]] <- NA
}
attrs
})
df = as.data.frame(lst)
Run Code Online (Sandbox Code Playgroud)
但是我有以下问题:
如何以正确的形式获取数据框?
txt <- '<people>
<person first="Mary" last="Jane" sex="F" />
<person first="Susan" last="Smith" sex="F" height="168" />
<person last="Black" first="Joseph" sex="M" />
<person first="Jessica" last="Jones" sex="F" />
</people>'
library(XML) # for xmlTreeParse
library(data.table) # for rbindlist(...)
xml <- xmlTreeParse(txt, asText=TRUE, useInternalNodes = TRUE)
rbindlist(lapply(xml["//person"],function(x)as.list(xmlAttrs(x))),fill=TRUE)
# first last sex height
# 1: Mary Jane F NA
# 2: Susan Smith F 168
# 3: Joseph Black M NA
# 4: Jessica Jones F NA
Run Code Online (Sandbox Code Playgroud)
您需要as.list(xmlAttrs(...))而不是仅仅xmlAttrs(...)因为rbindlist(...)希望每个参数都是一个列表,而不是一个向量。