我根据下面的alexce的建议做了改进.我需要的是如下图所示.但是,每行/每行应该是一个评论:日期,评级,评论文本和链接.
我需要让物品处理器处理每个页面的每个评论.
目前,TakeFirst()仅对页面进行第一次审核.所以10页,我只有10行/行,如下图所示.

蜘蛛代码如下:
import scrapy
from amazon.items import AmazonItem
class AmazonSpider(scrapy.Spider):
name = "amazon"
allowed_domains = ['amazon.co.uk']
start_urls = [
'http://www.amazon.co.uk/product-reviews/B0042EU3A2/'.format(page) for page in xrange(1,114)
]
def parse(self, response):
for sel in response.xpath('//*[@id="productReviews"]//tr/td[1]'):
item = AmazonItem()
item['rating'] = sel.xpath('div/div[2]/span[1]/span/@title').extract()
item['date'] = sel.xpath('div/div[2]/span[2]/nobr/text()').extract()
item['review'] = sel.xpath('div/div[6]/text()').extract()
item['link'] = sel.xpath('div/div[7]/div[2]/div/div[1]/span[3]/a/@href').extract()
yield item
Run Code Online (Sandbox Code Playgroud) 我正在尝试通过使用以下代码来实验gensim doc2vec.据我从教程中理解,它应该工作.但是它给出了AttributeError:'list'对象没有属性'words'.
from gensim.models.doc2vec import LabeledSentence, Doc2Vec
document = LabeledSentence(words=['some', 'words', 'here'], tags=['SENT_1'])
model = Doc2Vec(document, size = 100, window = 300, min_count = 10, workers=4)
Run Code Online (Sandbox Code Playgroud)
那我做错了什么?请帮忙.谢谢.我使用的是python 3.5和gensim 0.12.4
当我尝试索引叶变量以使用自定义收缩函数更新梯度时遇到就地操作错误。我无法解决它。任何帮助表示高度赞赏!
import torch.nn as nn
import torch
import numpy as np
from torch.autograd import Variable, Function
# hyper parameters
batch_size = 100 # batch size of images
ld = 0.2 # sparse penalty
lr = 0.1 # learning rate
x = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,10,10))), requires_grad=False) # original
# depends on size of the dictionary, number of atoms.
D = Variable(torch.from_numpy(np.random.normal(0,1,(500,10,10))), requires_grad=True)
# hx sparse representation
ht = Variable(torch.from_numpy(np.random.normal(0,1,(batch_size,500,1,1))), requires_grad=True)
# Dictionary loss function
loss = nn.MSELoss()
# customized shrink function to update …Run Code Online (Sandbox Code Playgroud) python neural-network gradient-descent deep-learning pytorch