一次查询Sqlite多个参数并处理缺失值

Ran*_*ies 4 python sqlite list

是否有可能在SQL-Query中执行类似的操作?也许提供一个列表作为输入参数?我想要的日期是连续的,但数据库中并不存在所有日期.如果日期不存在,则结果应为"无".

dates = [dt.datetime(2008,1,1), dt.datetime(2008,1,2), dt.datetime(2008,1,3), dt.datetime(2008,1,4), dt.datetime(2008,1,5)]
id = "361-442"
result = []
for date in dates:
    curs.execute('''SELECT price, date FROM prices where date = ? AND id = ?''', (date, id))
    query = curs.fetchall()
    if  query == []:
        result.append([None, arg])
    else:
        result.append(query)
Run Code Online (Sandbox Code Playgroud)

unu*_*tbu 5

要在sqlite中执行所有工作,您可以使用LEFT JOIN来填写缺失的价格None:

sql='''
SELECT p.price, t.date
FROM ( {t} ) t
LEFT JOIN price p
ON p.date = t.date
WHERE p.id = ?
'''.format(t=' UNION ALL '.join('SELECT {d!r} date'.format(d=d) for d in date))

cursor.execute(sql,[id])
result=cursor.fetchall()
Run Code Online (Sandbox Code Playgroud)

但是,此解决方案需要在Python中形成(可能)巨大的字符串,以便创建所有所需日期的临时表.它不仅速度慢(包括创建临时表需要sqlite的时间)它也很脆弱:如果len(date)大于500,则sqlite引发

OperationalError: too many terms in compound SELECT
Run Code Online (Sandbox Code Playgroud)

如果您已在其他表格中拥有所有所需日期,则可能可以解决此问题.然后你可以用类似的东西替换丑陋的"UNION ALL"SQL

SELECT p.price, t.date
FROM ( SELECT date from dates ) t
LEFT JOIN price p
ON p.date = t.date
Run Code Online (Sandbox Code Playgroud)

虽然这是一个改进,但我的timeit测试(见下文)表明,在Python中完成部分工作仍然更快:


在Python中完成部分工作:

如果您知道日期是连续的,因此可以表示为范围,那么:

curs.execute('''
    SELECT date, price
    FROM prices
    WHERE date <= ?
        AND date >= ?
        AND id = ?''', (max(date), min(date), id))
Run Code Online (Sandbox Code Playgroud)

否则,如果日期是任意的,那么:

sql = '''
    SELECT date, price
    FROM prices
    WHERE date IN ({s})
        AND id = ?'''.format(s={','.join(['?']*len(dates))})
curs.execute(sql,dates + [id])
Run Code Online (Sandbox Code Playgroud)

为了形成result与列表None插入缺失的价格,可以形成一个dict出来的(date,price)对,并使用dict.get()方法提供的默认值None时,date关键是缺少的:

result = dict(curs.fetchall())
result = [(result.get(d,None), d) for d in date]
Run Code Online (Sandbox Code Playgroud)

请注意,以形成dict从日期价格的映射,我换的顺序dateprice在SQL查询.


时间测试:

我比较了这三个功能:

def using_sqlite_union():
    sql = '''
        SELECT p.price, t.date
        FROM ( {t} ) t
        LEFT JOIN price p
        ON p.date = t.date
    '''.format(t = ' UNION ALL '.join('SELECT {d!r} date'.format(d = str(d))
                                      for d in dates))
    cursor.execute(sql)
    return cursor.fetchall()

def using_sqlite_dates():
    sql = '''
        SELECT p.price, t.date
        FROM ( SELECT date from dates ) t
        LEFT JOIN price p
        ON p.date = t.date
    '''
    cursor.execute(sql)
    return cursor.fetchall()

def using_python_dict():
    cursor.execute('''
        SELECT date, price
        FROM price
        WHERE date <= ?
            AND date >= ?
            ''', (max(dates), min(dates)))

    result = dict(cursor.fetchall())
    result = [(result.get(d,None), d) for d in dates]
    return result

N = 500
m = 10
omit = random.sample(range(N), m)
dates = [ datetime.date(2000, 1, 1)+datetime.timedelta(days = i) for i in range(N) ]
rows = [ (d, random.random()) for i, d in enumerate(dates) if i not in omit ]
Run Code Online (Sandbox Code Playgroud)

rows定义了插入price表中的数据.


Timeit测试结果:

像这样运行timeit:

python -mtimeit -s'import timeit_sqlite_union as t' 't.using_python_dict()'
Run Code Online (Sandbox Code Playgroud)

产生了这些基准:

·????????????????????·????????????????????·
?  using_python_dict ? 1.47 msec per loop ?
? using_sqlite_dates ? 3.39 msec per loop ?
? using_sqlite_union ? 5.69 msec per loop ?
·????????????????????·????????????????????·
Run Code Online (Sandbox Code Playgroud)

using_python_dict比约快2.3倍using_sqlite_dates.即使我们将总日期数增加到10000,速度比仍保持不变:

·????????????????????·????????????????????·
?  using_python_dict ? 32.5 msec per loop ?
? using_sqlite_dates ? 81.5 msec per loop ?
·????????????????????·????????????????????·
Run Code Online (Sandbox Code Playgroud)

结论:将所有工作转移到sqlite并不一定更快.