kba*_*ang 0 python lxml namespaces xml-parsing
我需要在lxml中的特定标签后获取一些信息.xml doc看起来像这样
<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://java.sun.com/xml/ns/j2ee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/
ns/j2ee/web-app_2_4.xsd"
version="2.4">
<display-name>Community Bank</display-name>
<description>WebGoat for Cigital</description>
<context-param>
<param-name>PropertiesPath</param-name>
<param-value>/WEB-INF/properties.txt</param-value>
<description>This is the path to the properties file from the servlet root</description>
</context-param>
<servlet>
<servlet-name>Index</servlet-name>
<servlet-class>com.cigital.boi.servlet.index</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>Index</servlet-name>
<url-pattern>/index</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>Index</servlet-name>
<url-pattern>/index.html</url-pattern>
</servlet-mapping>
Run Code Online (Sandbox Code Playgroud)
我想阅读com.cigital.boi.servlet.index.
我已经使用这段代码来读取servlet下的所有内容
context = etree.parse(handle)
list = parser.xpath('//servlet')
print list
Run Code Online (Sandbox Code Playgroud)
list只包含更多信息:迭代上下文字段我找到了这些行.
<Element {http://java.sun.com/xml/ns/j2ee}servlet-name at 2ad19e6eca48>
<Element {http://java.sun.com/xml/ns/j2ee}servlet-class at 2ad19e6ecaf8>
Run Code Online (Sandbox Code Playgroud)
我在想,因为我在搜索时没有包含名称空间,输出是空列表.请建议在servlet-class标签中阅读"com.cigital.boi.servlet.index"
试试以下:
from lxml import etree
context = etree.parse(handle)
print next(x.text for x in context.xpath('.//*[local-name()="servlet-class"]'))
Run Code Online (Sandbox Code Playgroud)
替代方案:
from lxml import etree
context = etree.parse(handle)
nsmap = context.getroot().nsmap.copy()
nsmap['xmlns'] = nsmap.pop(None)
print next(x.text for x in context.xpath('.//xmlns:servlet-class', namespaces=nsmap))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2924 次 |
| 最近记录: |