我有以下两个代码段做同样的事情,除了一个是编译表达式而另一个只是评估它.
//1st option - compile and run
//make the XPath object compile the XPath expression
XPathExpression expr = xpath.compile("/inventory/book[3]/preceding-sibling::book[1]");
//evaluate the XPath expression
Object result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
//print the output
System.out.println("1st option:");
for (int i = 0; i < nodes.getLength(); i++) {
System.out.println("i: " + i);
System.out.println("*******");
System.out.println(nodeToString(nodes.item(i)));
System.out.println("*******");
}
//2nd option - evaluate an XPath expression without compiling
Object result2 = xpath.evaluate("/inventory/book[3]/preceding-sibling::book[1]",doc,XPathConstants.NODESET);
System.out.println("2nd option:");
nodes = (NodeList) result2;
//print the output
for (int i …Run Code Online (Sandbox Code Playgroud) 我有一个xform文档
<?xml version="1.0" encoding="UTF-8"?><h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:jr="http://openrosa.org/javarosa">
<h:head>
<h:title>Summary</h:title>
<model>
<instance>
<data vaultType="nsp_inspection.4.1">
<metadata vaultType="metadata.1.1">
<form_start_time type="dateTime" />
<form_end_time type="dateTime" />
<device_id type="string" />
<username type="string" />
</metadata>
<date type="date" />
<monitor type="string" />
</data>
</instance>
</model>
</h:head>
Run Code Online (Sandbox Code Playgroud)
我想使用xpath和jdom从xform中选择数据元素
XPath xpath = XPath.newInstance("h:html/h:head/h:title/");
Run Code Online (Sandbox Code Playgroud)
似乎工作正常,并选择标题元素,但
XPath xpath = XPath.newInstance("h:html/h:head/model");
Run Code Online (Sandbox Code Playgroud)
不选择模型元素.我想这与命名空间有关.
文件结构:
<program>
<projectionDay>
<projection/>
<projection/>
</projectionDay>
<projectionDay>
<projection/>
<projection/>
</projectionDay>
</program>
Run Code Online (Sandbox Code Playgroud)
我想选择第一个和最后一个投影(在整个文档中).
这会返回它:
/descendant::projection[position() = 1 or position() = last()]
Run Code Online (Sandbox Code Playgroud)
这将在projectionDay中返回第一个和最后一个
//projection[position() = 1 or position() = last()]
Run Code Online (Sandbox Code Playgroud)
为什么会这样?
我有以下HTML文档
<div class="books">
<div class="book">
<div>
there are many deep nested elements here, somewhere there will be one span with some text e.g. 'mybooktext' within these
<div>
<div>
<div>
<span>mybooktext</span>
</div>
</div>
</div>
</div>
<div>
there are also many nested elements here, somewhere there will be a link with a class called 'mylinkclass' within these. (this is the element i want to find)
<div>
<div>
<a class="mylinkclass">Bla</a>
</div>
</div>
</div>
</div>
<div class="book">
<div>
there are many deep nested elements …Run Code Online (Sandbox Code Playgroud) 我正在废弃一个网站,我在scrapy中写了一个蜘蛛,但我能够使用这个提取产品价格:
hxs.select('//div[@class="product_list"]//div[@class="product_list_offerprice"]/text()').extract()
Run Code Online (Sandbox Code Playgroud)
通过scrapy shell
但是当我试图用蜘蛛做同样的事情时它会返回空列表
这是我的蜘蛛代码:
from eScraper.items import EscraperItem
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider
#------------------------------------------------------------------------------
class ESpider(CrawlSpider):
name = "ashikamallSpider"
allowed_domains = ["ashikamall.com"]
URLSList = []
for n in range (1,51):
URLSList.append('http://ashikamall.com/products.aspx?id=222&page=' + str(n))
start_urls = URLSList
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select('//div[@class="product_list"]')
items = []
for site in sites:
item = EscraperItem()
item['productDesc'] = ""
item['productSite'] = "http://1click1call.com/"
item['productTitle'] = site.select('div[@class="product_list_name"]/h3/text()').extract()
item['productPrice'] = site.select('div[@class="product_list_offerprice"]/text()').extract()
item['productURL'] = "http://ashikamall.com/" + site.select('div[@class="product_list_image"]/a/@href').extract()[0].encode('utf-8')
item['productImage'] …Run Code Online (Sandbox Code Playgroud) 我的代码看起来像:
file = Nokogiri::XML(File.open('file.xml'))
test = file.xpath("//title") #all <title> elements in xml file
Run Code Online (Sandbox Code Playgroud)
然后,当我尝试:
puts test.uniq
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
undefined method `uniq' for #<Nokogiri::XML::NodeSet:0x000000011b8bf8>
Run Code Online (Sandbox Code Playgroud)
是test不是数组?如果不是,我该怎么做呢?
否则,我如何只从test数组中获取唯一值?
所以我使用的是Selenium和Python 2.7的组合(如果重要的话,我所使用的浏览器是Firefox).我是XPath的新手,但对于获取WebElements似乎非常有用.
我有以下HTML文件,我正在解析:
<html>
<head></head>
<body>
..
<div id="childItem">
<ul>
<li class="listItem"><img/><span>text1</span></li>
<li class="listItem"><img/><span>text2</span></li>
...
<li class="listItem"><img/><span>textN</span></li>
</ul>
</div>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)
现在我可以使用以下代码获取所有li元素的列表:
root = element.find_element_by_xpath('..')
child = root.find_element_by_id('childDiv')
list = child.find_elements_by_css_selector('div.childDiv > ul > li.listItem')
Run Code Online (Sandbox Code Playgroud)
我想知道如何在XPath语句中执行此操作.我尝试了一些法规,但最简单的是:
list = child.find_element_by_xpath('li[@class="listItem"]')
Run Code Online (Sandbox Code Playgroud)
但我总是得到错误:
selenium.common.exceptions.NoSuchElementException: Message: u'Unable to locate element: {"method":"xpath","selector":"li[@class=\\"listItem\\"]"}';
因为我有一个解决方法(前三行)这对我来说并不重要,但我想知道我做错了什么.
我需要使用capyvara从选定的单选按钮内的span标签中读取文本值
我有一个radio button后跟文本的列表及其括号内的计数.
例如:radiobutton with Thank You(82)
I what想要读取括号内选定的单选按钮数82.
我使用以下代码..但它不起作用
value=page.find(".cardFilterItemSelection[checked='checked'] + span.itemCount").text
Run Code Online (Sandbox Code Playgroud)
并尝试使用Xpath但没有得到任何东西
value=page.find(:xpath,"//input[@class = 'cardFilterItemSelection' and @checked = 'checked']/span[@class = 'itemCount']/text()")
Run Code Online (Sandbox Code Playgroud)
怎么可能?
<label id="thankyou_label" for="thankyou_radio" class="itemName radio">
<input checked="checked" tagtype="Occasion" value="Thank You" id="thankyou_radio" name="occasionGroup" class="cardFilterItemSelection" type="radio">
<span class="occasion_display_name">
Thank You
</span>
<span class="itemCount">
(82)
</span>
</label>
<label id="spring_label" class="itemName radio" for="spring_radio">
<input id="spring_radio" class="cardFilterItemSelection" type="radio" name="occasionGroup" value="Spring" tagtype="Occasion">
<span class="occasion_display_name">
Spring
</span>
<span class="itemCount">
(0)
</span>
</label>
Run Code Online (Sandbox Code Playgroud) 我想检查一个xml,如果有一个节点的值为"Hotel Hafen Hamburg".
但是我得到了错误.
SimpleXMLElement :: xpath():第25行的谓词无效
你可以在这里查看xml.
http://de.sourcepod.com/dkdtrb22-19748
到目前为止,我已经编写了以下代码.
$apiUmgebungUrl = "xml.xml";
$xml_umgebung = simplexml_load_file($apiUmgebungUrl);
echo $nameexist = $xml_umgebung->xpath('boolean(//result/name[@Hotel Hafen Hamburg');
Run Code Online (Sandbox Code Playgroud) 请看下面的html片段:
<tr class="clickable">
<td id="7b8ee8f9-b66f-4fba-83c1-4cf2827130b5" class="clickable">
<a class="editLink" href="#">Single</a>
</td>
<td class="clickable">£14.00</td>
</tr>
Run Code Online (Sandbox Code Playgroud)
当td [1]包含"Single"时,我试图断言td [2]的值.我尝试了各种各样的变种:
//td[2][(contains(text(),'£14.00'))]/../td[1][(contains(text(),'Single'))]
我在其他地方成功地使用了类似的符号 - 但在这里无济于事......我认为这是由于嵌套元素的td [1],但不确定.
有人可以启发我的错误吗?:)
干杯!
xpath ×10
xml ×5
java ×2
ruby ×2
capybara ×1
cucumber ×1
html ×1
html-table ×1
ide ×1
javarosa ×1
jdom ×1
nokogiri ×1
python ×1
python-2.7 ×1
scrapy ×1
selenium ×1
web-scraping ×1
xml-parsing ×1