小编Lar*_* M.的帖子

在scrapy中剥离\n\t\r \n

我试图用scrapy蜘蛛剥离\ r \n\t字符,然后制作一个json文件.

我有一个"描述"对象,它充满了新的行,并没有做我想要的:将每个描述与标题相匹配.

我尝试使用map(unicode.strip()),但它并没有真正起作用.作为scrapy的新手我不知道是否有另一种更简单的方法或者map unicode是如何工作的.

这是我的代码:

def parse(self, response):
    for sel in response.xpath('//div[@class="d-grid-main"]'):
        item = xItem()
        item['TITLE'] = sel.xpath('xpath').extract()
        item['DESCRIPTION'] = map(unicode.strip, sel.xpath('//p[@class="class-name"]/text()').extract())

Run Code Online (Sandbox Code Playgroud)

我也尝试过:

item['DESCRIPTION'] = str(sel.xpath('//p[@class="class-name"]/text()').extract()).strip()

Run Code Online (Sandbox Code Playgroud)

但它引发了一个错误.什么是最好的方式？

python unicode scrapy

Lar*_* M.

2016 02-09

18
推荐指数

2
解决办法

1万
查看次数

使用 youtube-dl 从播放列表列表中获取视频信息

我正在尝试使用 youtube-dl 从 youtube 中的播放列表列表中获取一些信息。我已经编写了这段代码，但它需要的不是视频信息而是播放列表信息（例如播放列表标题而不是播放列表中的视频标题）。我不明白为什么。

input_file = open("url")
for video in input_file:
    print(video)
ydl_opts = {
    'ignoreerrors': True
}
    with youtube_dl.YoutubeDL(ydl_opts) as ydl: 
                info_dict = ydl.extract_info(video, download=False)
                for i in info_dict:
                    video_thumbnail = info_dict.get("thumbnail"),
                    video_id = info_dict.get("id"),
                    video_title = info_dict.get("title"),
                    video_description = info_dict.get("description"),
                    video_duration = info_dict.get("duration")

Run Code Online (Sandbox Code Playgroud)

任何帮助将不胜感激。

youtube python-3.x youtube-dl

Lar*_* M.

lucky-day

5
推荐指数

2
解决办法

8586
查看次数

解析本地 HTML python (lxml)

我正在尝试使用 lxml 解析本地 HTML，但出现错误，但我不知道为什么（对于错误的代码提前抱歉，我是新手）。

from lxml import etree, html
from StringIO import StringIO

parser = etree.HTMLParser()
doc = etree.parse(StringIO("test1.html"), parser)
tree = html.fromstring(doc)
CCE = tree.xpath('//div[@data-reactid]/div[@class="browse-summary"]/h1')
URL = tree.xpath('//a[@class="rc-OfferingCard"]/@href')

print 'CCE:', CCE
print 'URL:', URL

Run Code Online (Sandbox Code Playgroud)

这是错误：

  File "test.py", line 8, in <module>
tree = html.fromstring(doc)
File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 703, in fromstring
is_full_html = _looks_like_full_html_unicode(html)
TypeError: expected string or buffer

Run Code Online (Sandbox Code Playgroud)

python lxml html-xml-utils

Lar*_* M.

lucky-day

4
推荐指数

1
解决办法

2978
查看次数