Python在fetchone上运行缓慢,挂起在fetchall上

IAm*_*ale 6 python sqlite

我正在编写一个脚本来SELECT查询数据库并解析~33,000条记录.不幸的是,我在cursor.fetchone()/ cursor.fetchall()阶段遇到了问题.

我首先尝试一次迭代光标一条记录,如下所示:

# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
while True:
    # Get the next row in the cursor
    row = cursor.fetchone()
    if row == None:
        break

    # TODO: Determine if there's any kanji in row[2]

    weight = float((row[3] + row[4]))/2
    printStatus("Weight: " + str(weight))
Run Code Online (Sandbox Code Playgroud)

根据输出printStatus(打印出时间戳加上传递给它的任何字符串),脚本花了大约1秒来处理每一行.这让我相信每次循环迭代时都会重新运行查询(使用LIMIT 1或其他东西),因为在SQLiteStudio [i]和[/]中运行一次相同的查询需要大约1秒钟我]返回所有33,000行.我计算出,按照这个速度,通过所有33,000条记录大约需要7个小时.

我尝试使用cursor.fetchall()代替而不是坐在那里:

results = cursor.fetchall()

# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
for row in results:
    # TODO: Determine if there's any kanji in row[2]

    weight = float((row[3] + row[4]))/2
    printStatus("Weight: " + str(weight))
Run Code Online (Sandbox Code Playgroud)

不幸的是,当它到达cursor.fetchall()生产线时,Python可执行文件锁定了25%的CPU和大约6MB的RAM .我让脚本运行了大约10分钟,但什么都没发生.

是否有大约33,000个返回的行(大约5MB的数据)太多,Python无法一次抓取?我一次又一次地迭代了吗?或者我能做些什么来加快速度?

编辑:这是一些控制台输出

12:56:26.019: Adding new column 'weight' and related index to r_ele
12:56:26.019: Querying database
12:56:28.079: Starting weight calculations
12:56:28.079: Weight: 1.0
12:56:28.079: Weight: 0.5
12:56:28.080: Weight: 0.5
12:56:28.338: Weight: 1.0
12:56:28.339: Weight: 3.0
12:56:28.843: Weight: 1.5
12:56:28.844: Weight: 1.0
12:56:28.844: Weight: 0.5
12:56:28.844: Weight: 0.5
12:56:28.845: Weight: 0.5
12:56:29.351: Weight: 0.5
12:56:29.855: Weight: 0.5
12:56:29.856: Weight: 1.0
12:56:30.371: Weight: 0.5
12:56:30.885: Weight: 0.5
12:56:31.146: Weight: 0.5
12:56:31.650: Weight: 1.0
12:56:32.432: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.952: Weight: 1.0
12:56:33.454: Weight: 0.5
12:56:33.455: Weight: 0.5
12:56:33.455: Weight: 1.0
12:56:33.716: Weight: 0.5
12:56:33.716: Weight: 1.0
Run Code Online (Sandbox Code Playgroud)

这是SQL查询:

//...snip (it wasn't the culprit)...
Run Code Online (Sandbox Code Playgroud)

SQLiteStudio的EXPLAIN QUERY PLAN输出:

0   0   0   SCAN TABLE r_ele AS re USING COVERING INDEX r_ele_fk (~500000 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 1
1   0   0   SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 2
2   0   0   SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
2   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 3
3   0   0   SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
3   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 4
4   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 5
5   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 6
6   0   0   SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 7
7   0   0   SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
7   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 8
8   0   0   SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
8   0   0   EXECUTE CORRELATED SCALAR SUBQUERY 9
9   0   0   SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
Run Code Online (Sandbox Code Playgroud)

CL.*_*CL. 4

SQLite 即时计算结果记录。 fetchone速度很慢,因为它必须为 中的每条记录执行所有子查询r_elefetchall甚至更慢,因为它花费的时间与您对所有记录执行的时间一样长fetchone

SQLite 3.7.13估计该列上的所有查找value都会非常慢,因此为此查询创建了一个临时索引。您应该创建一个永久索引,以便 SQLite 3.6.21 可以使用它:

CREATE INDEX idx_k_ele_value ON k_ele(value);
Run Code Online (Sandbox Code Playgroud)

如果这没有帮助,请更新到具有较新 SQLite 版本的 Python,或使用具有较新 SQLite 版本内置的另一个数据库库,例如APSW