如果前一个元素包含匹配的文本(),则选择一个元素的XPath - Python,Scrapy

Question

如果前一个元素包含匹配的文本(),则选择一个元素的XPath - Python,Scrapy

chh*_*plo 3 python xpath web-crawler scrapy

如果前面的元素text()与特定条件匹配,我想提取一个元素.例如,

<html>
<div>
<table class="layouttab">
    <tbody>
    <tr>
        <td scope="row" class="srb">General information:&nbsp;&nbsp;</td>
        <td>(xxx) yyy-zzzz</td>
    </tr>
    <tr>
        <td scope="row" class="srb">Website:&nbsp;&nbsp;</td>
        <td><a href="http://xyz.edu" target="_blank">http://www.xyz.edu</a>
        </td>
    </tr>
    <tr>
        <td scope="row" class="srb">Type:&nbsp;&nbsp;</td>
        <td>4-year, Private for-profit</td>
    </tr>
    <tr>
        <td scope="row" class="srb">Awards offered:&nbsp;&nbsp;</td>
        <td>Less than one year certificate<br>One but less than two years certificate<br>Associate's degree<br>Bachelor's
            degree
        </td>
    </tr>
    <tr>
        <td scope="row" class="srb">Campus setting:&nbsp;&nbsp;</td>
        <td>City: Small</td>
    </tr>
    <tr>
        <td scope="row" class="srb">Related Institutions:</td>
        <td><a href="?q=xyz">xyz-New York</a>
            (Parent):
            <ul>
                <li style="list-style:circle">Berkeley College - Westchester Campus</li>
            </ul>
        </td>
    </tr>
    </tbody>
</table>
</div>
</html>

Run Code Online (Sandbox Code Playgroud)

现在,如果前一个元素在text()属性中有"Website:",我想提取URL.我使用scthon 0.14的python 2.x. 我能够使用诸如的单个元素来提取数据

 item['Header_Type']= site.select('div/table[@class="layouttab"]/tr[3]/td[2]/text()').extract()

Run Code Online (Sandbox Code Playgroud)

但是如果缺少网站参数并且tr [3]向上移动并且我在网站元素中获得"类型"并且在类型中获得"奖励提供",则此方法失败.

在xPath中是否有特定的命令,

'div/table[@class="layouttab"]/tr/td[2] {if td[1] has text = "Website"}

Run Code Online (Sandbox Code Playgroud)

提前致谢.

Answer 1

小智 5

对于python和scrapy,您应该使用以下选择"Type"字段,对我来说很有用.

item['Header_Type']= site.select('div[1]/table[@class="layouttab"]/tr/td[contains(text(),"Type")]/following-sibling::td[1]/text()').extract()

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，7 月前
查看次数：	8024 次
最近记录：	13 年，7 月前