如何在python中使用scrapy获取直接父节点？

Question

如何在python中使用scrapy获取直接父节点？

Sim*_*mon 4 python xpath web-crawler parent-child scrapy

我是scrapy的新手。我想从网上抓取一些数据。我得到了如下所示的 html 文档。

dom style1:
<div class="user-info">
    <p class="user-name">
        something in p tag
    </p>
    text data I want
</div>

dom style2:
<div class="user-info">
    <div>
        <p class="user-img">
            something in p tag
        </p>
        something in div tag
    </div>
    <div>
        <p class="user-name">
            something in p tag
        </p>
        text data I want
    </div>
</div>

Run Code Online (Sandbox Code Playgroud)

我想获取我想要的数据文本数据，现在我可以使用css或xpath 选择器通过检查它是否存在来获取它。但我想知道一些更好的方法。例如，我可以得到的CSSp.user-name第一，然后我得到它的母公司，然后我得到它的div/text()，而我总是想要的数据是text()对的p.user-name的直接父div，但问题是，我怎么能得到直接父p.user-name？

Answer 1

Gra*_*rus 13

使用 xpath，您可以在 css 不支持的每个方向（父、兄弟、子等）遍历 xml 树。
对于您的情况，您可以使用 xpath..父符号获取节点的父节点：

//p[@class='user-name']/../text()

Run Code Online (Sandbox Code Playgroud)

说明：
//p[@class='user-name']- 查找<p>具有类值的节点user-name。
/..- 选择节点的父节点。
/text()- 选择当前节点的文本。

此 xpath 应该适用于您描述的两种情况。

归档时间：	8 年，6 月前
查看次数：	5734 次
最近记录：	8 年，6 月前