Shu*_* B. 2 csv scrapy web-scraping
I want to extract from following html code:
<li>
<a test="test" href="abc.html" id="11">Click Here</a>
"for further reference"
</li>
Run Code Online (Sandbox Code Playgroud)
I'm trying to do with following extract command
response.css("article div#section-2 li::text").extract()
Run Code Online (Sandbox Code Playgroud)
But it is giving only "for further reference" line And Expected output is "Click Here for further reference" as a one string. How to do this? How to modify this to do the same if following patterns are there:
至少有几种方法可以做到这一点:
让我们首先构建一个模拟您的响应的测试选择器:
>>> response = scrapy.Selector(text="""<li>
... <a test="test" href="abc.html" id="11">Click Here</a>
... "for further reference"
... </li>""")
Run Code Online (Sandbox Code Playgroud)
第一个选项,对 CSS 选择器稍作改动。查看所有文本后代,而不仅仅是文本子元素(注意li和::text伪元素之间的空格):
# this is your CSS select,
# which only gives direct children text of your selected LI
>>> response.css("li::text").extract()
[u'\n ', u'\n "for further reference"\n']
# notice the extra space
# here
# |
# v
>>> response.css("li ::text").extract()
[u'\n ', u'Click Here', u'\n "for further reference"\n']
# using Python's join() to concatenate and build the full sentence
>>> ''.join(response.css("li ::text").extract())
u'\n Click Here\n "for further reference"\n'
Run Code Online (Sandbox Code Playgroud)
另一种选择是将您的.css()调用与 XPath 1.0string()或normalize-space()在后续.xpath()调用中链接起来:
>>> response.css("li").xpath('string()').extract()
[u'\n Click Here\n "for further reference"\n']
>>> response.css("li").xpath('normalize-space()').extract()
[u'Click Here "for further reference"']
# calling `.extract_first()` gives you a string directly, not a list of 1 string
>>> response.css("li").xpath('normalize-space()').extract_first()
u'Click Here "for further reference"'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2058 次 |
| 最近记录: |