在 R 中从 XML 中提取数据

Sas*_*sha 3 xml r

我需要从 XML 中提取某些数据,如下所示(为简洁起见,进行了简化)

<Doc name="Doc1">
    <Lists Count="1">
        <List Name="List1">
            <Points Count="3">
                <Point Id="1">
                    <Tags Count ="1">"a"</Tags>
                    <Point Position="1"  /> 
                </Point>
                <Point Id="2">
                    <Point Position="2"  /> 
                </Point>
                <Point Id="3">
                    <Tags Count="1">"c"</Tags>
                    <Point Position="3"  /> 
                </Point>
            </Points>
        </List>
    </Lists>
</Doc>
Run Code Online (Sandbox Code Playgroud)

输出应该是一个数据框,该数据框与每个点 ID 的标签和位置相匹配

    Point  Tag Position
1     1    a        1
2     2 <NA>        2
3     3    c        3
Run Code Online (Sandbox Code Playgroud)

我是 XML 新手,我正在使用 xml2 包。到目前为止,我可以单独提取每个变量,但由于某些点可能没有 Tag data ,我找不到在三个参数之间进行匹配的方法。

> library(xml2)
> xml_data<-read_xml(...)
> xml_data %>% xml_find_all("//Point") %>% xml_attr("Id")
[1] "1" "2" "3"
> xml_data %>% xml_find_all("//Vertical") %>% xml_attr("Position")
[1] "1" "2" "3"
> xml_data %>% xml_find_all("//Tags") %>% xml_text()
[1] "\"a\"" "\"c\""
Run Code Online (Sandbox Code Playgroud)

hrb*_*str 5

purrrxml2一起顺利进行:

\n\n
library(xml2)\nlibrary(purrr)\n\ntxt <- \'<Doc name="Doc1">\n    <Lists Count="1">\n        <List Name="List1">\n            <Points Count="3">\n                <Point Id="1">\n                    <Tags Count ="1">"a"</Tags>\n                    <Point Position="1"  /> \n                </Point>\n                <Point Id="2">\n                    <Point Position="2"  /> \n                </Point>\n                <Point Id="3">\n                    <Tags Count="1">"c"</Tags>\n                    <Point Position="3"  /> \n                </Point>\n            </Points>\n        </List>\n    </Lists>\n</Doc>\'\n\ndoc <- read_xml(txt)\nxml_find_all(doc, ".//Points/Point") %>% \n  map_df(function(x) {\n    list(\n      Point=xml_attr(x, "Id"),\n      Tag=xml_find_first(x, ".//Tags") %>%  xml_text() %>%  gsub(\'^"|"$\', "", .),\n      Position=xml_find_first(x, ".//Point") %>% xml_attr("Position")\n    )\n  })\n## # A tibble: 3 \xc3\x97 3\n##   Point   Tag Position\n##   <chr> <chr>    <chr>\n## 1     1     a        1\n## 2     2  <NA>        2\n## 3     3     c        3\n
Run Code Online (Sandbox Code Playgroud)\n