我有两张桌子: -
manu_table
product_id, manufacturer
1, ford
2, ford
3, toyota
product_table
product_id, score
1, 80
2, 60
3, 40
Run Code Online (Sandbox Code Playgroud)
我想在摘要表中为每个制造商存储最高得分product_id: -
summary_table
manufacturer, max_score
ford, 1
toyota, 3
Run Code Online (Sandbox Code Playgroud)
到目前为止,我有: -
UPDATE summary_table st
SET max_score = (
SELECT product_id
FROM (
SELECT manufacturer, product_id, max(score) as ms
FROM manu_table
LEFT JOIN product_table USING (product_id)
group by product_id) t)
WHERE st.manufacturer = manu_table.manufacturer;
Run Code Online (Sandbox Code Playgroud)
有麻烦...所有的帮助都非常感激.
cur.execute("SELECT \
title, \
body, \
date \ # This pgsql type is date
FROM \
table \
WHERE id = '%s';", id)
response = cur.fetchall()
print response
Run Code Online (Sandbox Code Playgroud)
作为一个例子,这给了我: -
[('sample title', 'sample body', datetime.date(2012, 8, 5))]
Run Code Online (Sandbox Code Playgroud)
哪个不能传递给像json.dumps这样的东西所以我必须这样做: -
processed = []
for row in response:
processed.append({'title' : row[0],
'body' : row[1],
'date' : str(row[2])
})
Run Code Online (Sandbox Code Playgroud)
感觉形状不好,有没有人知道更好的处理方式?
这是有问题的代码(一个非常简单的爬虫),该文件是一个 url 列表,通常大于 1000。
import sys, gevent
from gevent import monkey
from gevent.pool import Pool
import httplib, socket
from urlparse import urlparse
from time import time
pool = Pool(100)
monkey.patch_all(thread=False)
count = 0
size = 0
failures = 0
global_timeout = 5
socket.setdefaulttimeout(global_timeout)
def process(ourl, mode = 'GET'):
global size, failures, global_timeout, count
try:
url = urlparse(ourl)
start = time()
conn = httplib.HTTPConnection(url.netloc, timeout = global_timeout)
conn.request(mode, ourl)
res = conn.getresponse()
req = res.read()
end = time()
bytes = len(req) …Run Code Online (Sandbox Code Playgroud)