我想在scrapy中写入csv文件
for rss in rsslinks:
item = AppleItem()
item['reference_link'] = response.url
base_url = get_base_url(response)
item['rss_link'] = urljoin_rfc(base_url,rss)
#item['rss_link'] = rss
items.append(item)
#items.append("\n")
f = open(filename,'a+') #filename is apple.com.csv
for item in items:
f.write("%s\n" % item)
Run Code Online (Sandbox Code Playgroud)
我的输出是这样的:
{'reference_link': 'http://www.apple.com/'
'rss_link': 'http://www.apple.com/rss '
{'reference_link': 'http://www.apple.com/rss/'
'rss_link': 'http://ax.itunes.apple.com/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=10/rss.xml'}
{'reference_link': 'http://www.apple.com/rss/'
'rss_link': 'http://ax.itunes.apple.com/WebObjects/MZStore.woa/wpa/MRSS/newreleases/limit=25/rss.xml'}
Run Code Online (Sandbox Code Playgroud)
我想要的是这种格式:
reference_link rss_link
http://www.apple.com/ http://www.apple.com/rss/
Run Code Online (Sandbox Code Playgroud) 我正在使用Python3.4,我正在尝试安装模块模糊
https://pypi.python.org/pypi/Fuzzy.
Run Code Online (Sandbox Code Playgroud)
因为它被提到它只适用于Python2,我试图使用cython转换它.这些是我遵循的步骤:
python-config --cflags-c fuzzy.c -o fuzzy.opython-config --libs当我尝试导入模糊时出现错误:
dynamic module does not define init function (PyInit_fuzzy)
Run Code Online (Sandbox Code Playgroud)
有什么问题?这是因为python2和python3的冲突吗?怎么解决这个?
我试图从字符串中删除长度小于4的单词.
我用这个正则表达式:
re.sub(' \w{1,3} ', ' ', c)
Run Code Online (Sandbox Code Playgroud)
虽然这会删除一些字符串,但是当2-3个长度小于4的单词出现时它会失败.喜欢:
I am in a bank.
Run Code Online (Sandbox Code Playgroud)
它给了我:
I in bank.
Run Code Online (Sandbox Code Playgroud)
怎么解决这个?
假设我有以下型号:
class Location(models.Model)
continent = models.CharField(max_length=20)
country = models.ForeignKey(Country)
我需要创建一个依赖的下拉列表,以便当我选择一个大陆时,我会获得属于该大陆的所有国家/地区.我该怎么办?
这是我的custom_filters.py文件:
from scrapy.dupefilter import RFPDupeFilter
class SeenURLFilter(RFPDupeFilter):
def __init__(self, path=None):
self.urls_seen = set()
RFPDupeFilter.__init__(self, path)
def request_seen(self, request):
if request.url in self.urls_seen:
return True
else:
self.urls_seen.add(request.url)
Run Code Online (Sandbox Code Playgroud)
添加了以下行:
DUPEFILTER_CLASS = 'crawl_website.custom_filters.SeenURLFilter'
Run Code Online (Sandbox Code Playgroud)
到settings.py
当我检查生成的csv文件时,它会多次显示一个URL.这是错的吗?
我正在尝试批量插入文档.但它在批量插入期间不会插入超过84个文档.给我这个错误:
in insert pymongo.errors.InvalidOperation: cannot do an empty bulk insert
Run Code Online (Sandbox Code Playgroud)
是否可以进行批量插入,例如每个插入插入50个文档?
这是我的设置:
set(['description', 'title'])
Run Code Online (Sandbox Code Playgroud)
我需要将此内容写入2列的csv文件中。我的代码:
cw = csv.writer(open("hello.csv",'w'))
cw.writerows(cols)
Run Code Online (Sandbox Code Playgroud)
引发错误:
cw.writerow(cols)
_csv.Error: sequence expected
Run Code Online (Sandbox Code Playgroud)
将清单新增至csv档案:
cw.writerow(list(cols))
for row in data:
cw.writerow([str(row.get(k,'N/A')) for k in cols])
Run Code Online (Sandbox Code Playgroud)
找到了纠正此问题的方法:以wb模式而不是w模式打开文件
我需要用两个数字之间的逗号替换空格
15.30 396.90 => 15.30,396.90
Run Code Online (Sandbox Code Playgroud)
在PHP中使用它:
'/(?<=\d)\s+(?=\d)/', ','
Run Code Online (Sandbox Code Playgroud)
怎么用Python做?
我有两个约会:
2014年3月15日星期六19:47:17 +0000
2014-03-12 19:50:22.159411 + 00:00
我需要比较这两个日期,但我得到了错误
TypeError: can't compare datetime.datetime to unicode
Run Code Online (Sandbox Code Playgroud)
我该如何转换其中一个?
我正在尝试使用带有分隔符Control A的cut命令从file1到file2获取一些列.
这是我试过的:
cut -d^A -f2-8 a.dat > b.dat
Run Code Online (Sandbox Code Playgroud)
如果我的记录是这样的:
A^AB^AC^AD^AE^AF^AG^AH^A$
Run Code Online (Sandbox Code Playgroud)
我的命令给出:
AB^AC^AD^AE^AF^AG^AH
Run Code Online (Sandbox Code Playgroud)
我的命令错了还是我以错误的方式划分了分隔符?
因此它将Control-A的A留在了起点.
这是我的代码:
from skimage import io,color
filename = io.imread("input00.jpg")
img = color.rgb2gray(filename,as_grey=True)
io.imshow(img)
io.show()
Run Code Online (Sandbox Code Playgroud)
在第2行引发错误说:
AttributeError: 'builtin_function_or_method' object has no attribute 'iterkeys'
Run Code Online (Sandbox Code Playgroud)
追溯:
Traceback (most recent call last):
File "readImg.py", line 2, in <module>
filename = io.imread("input00.jpg")
File "/Library/Python/2.7/site-packages/skimage/io/_io.py", line 97, in imread
img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
File "/Library/Python/2.7/site-packages/skimage/io/manage_plugins.py", line 209, in call_plugin
return func(*args, **kwargs)
File "/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.9- intel.egg/matplotlib/pyplot.py", line 2198, in imread
return _imread(*args, **kwargs)
File "/Library/Python/2.7/site-packages/matplotlib-1.4.x-py2.7-macosx-10.9-intel.egg/matplotlib/image.py", line 1249, in imread
'more images' % list(six.iterkeys(handlers.keys)))
File "/Library/Python/2.7/site-packages/six-1.7.2-py2.7.egg/six.py", line …Run Code Online (Sandbox Code Playgroud) 我想在我的虚拟环境中安装django-rq,但我遇到了这个错误:
pip install django-rq
Collecting rq>=0.3.4 (from django-rq)
Using cached rq-0.5.1-py2.py3-none-any.whl
Hash of the package https://pypi.python.org/packages/py2.py3/r/rq/rq-0.5.1- py2.py3-none-any.whl#md5=45418bdc995c394b4293180a4c29cb88 (from
https://pypi.python.org/simple/rq/) (e9d365b19b099235441599de78b25042) doesn't match the expected hash 45418bdc995c394b4293180a4c29cb88!
Bad md5 hash for package https://pypi.python.org/packages/py2.py3/r/rq/rq-0.5.1-py2.py3-none-any.whl#md5=45418bdc995c394b4293180a4c29cb88 (from https://pypi.python.org/simple/rq/)
Run Code Online (Sandbox Code Playgroud)
我尝试在虚拟环境之外安装它,它工作得非常好.但我不确定为什么不在虚拟环境中安装.