好吧,问题在于标题.有可能吗?我可以使用CSS来使链接看起来像旗帜,这很容易,但也有一个文本(英语,法语,德语等)嗯,我会感激任何一种帮助.
我正在尝试使用python/scrapy编写解析脚本.如何从结果文件中的字符串中删除[]和u'?
现在我有这样的文字:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.utils.markup import remove_tags
from googleparser.items import GoogleparserItem
import sys
class GoogleparserSpider(BaseSpider):
name = "google.com"
allowed_domains = ["google.com"]
start_urls = [
"http://www.google.com/search?q=this+is+first+test&num=20&hl=uk&start=0",
"http://www.google.com/search?q=this+is+second+test&num=20&hl=uk&start=0"
]
def parse(self, response):
print "===START======================================================="
hxs = HtmlXPathSelector(response)
qqq = hxs.select('/html/head/title/text()').extract()
print qqq
print "---DATA--------------------------------------------------------"
sites = hxs.select('/html/body/div[5]/div[3]/div/div/div/ol/li/h3')
i = 1
items = []
for site in sites:
try:
item = GoogleparserItem()
title1 = site.select('a').extract()
title2=str(title1)
title=remove_tags(title2)
link=site.select('a/@href').extract()
item['num'] = i
item['title'] = title
item['link'] = …Run Code Online (Sandbox Code Playgroud)