MySQL:有效地填充存储过程中的表

xtr*_*trm 3 mysql database multithreading stored-procedures concat

我正在测试MySQL服务器的性能并填充一个包含超过2亿条记录的表.存储过程生成大SQL字符串非常慢.任何帮助或评论都非常受欢迎.

系统信息:

  • 数据库: MySQL 5.6.10 InnoDB数据库(测试).
  • 处理器: AMD Phenom II 1090T X6核心,每核心3910Mhz.
  • 内存: 16GB DDR3 1600Mhz CL8.
  • 高清: SSD中的Windows 7 64位SP1,安装在SSD中的mySQL,用机械硬盘写的日志.

存储过程创建一个INSERT sql查询,其中包含要插入表中的所有值.

DELIMITER $$
USE `test`$$

DROP PROCEDURE IF EXISTS `inputRowsNoRandom`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `inputRowsNoRandom`(IN NumRows BIGINT)
BEGIN
    /* BUILD INSERT SENTENCE WITH A LOS OF ROWS TO INSERT */
    DECLARE i BIGINT;
    DECLARE nMax BIGINT;
    DECLARE squery LONGTEXT;
    DECLARE svalues LONGTEXT;

    SET i = 1;
    SET nMax = NumRows + 1;
    SET squery = 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE) VALUES ';
    SET svalues = '("1", "a1", 100, 1, 500000, "2013-06-14 12:40:45"),';

    WHILE i < nMax DO
        SET squery = CONCAT(squery, svalues);
        SET i = i + 1;
    END WHILE;

    /*SELECT squery;*/
    SET squery = LEFT(squery, CHAR_LENGTH(squery) - 1);
    SET squery = CONCAT(squery, ";");
    SELECT squery;

    /* EXECUTE INSERT SENTENCE */
    /*START TRANSACTION;*/
    /*PREPARE stmt FROM squery;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    */

    /*COMMIT;*/
END$$
DELIMITER ;
Run Code Online (Sandbox Code Playgroud)


结果:

  1. 连接20000个字符串大约需要45秒才能处理:

CALL test.inputRowsNoRandom(20000);

  1. 连接100000个字符串大约需要+5/12分钟O_O:

CALL test.inputRowsNoRandom(100000);

结果(按持续时间排序) - 以秒为单位的声明(求和)百分比
释放项目0.00005
50.00000起始0.00002 20.00000正在
执行0.00001 10.00000
初始化0.00001 10.00000
清理0.00001 10.00000
总计0.00010 100.00000

由于执行查询而 导致状态变量更改变
量值描述
Bytes_received 21从客户端发送到服务器的
字节Bytes_sent 97从服务器发送的字节客户端
Com_select 1已执行的SELECT语句
数问题1服务器执行的语句数

测试:
我已经测试了12到64个线程的不同MySQL配置,开启和关闭缓存,将日志移动到另一个硬件磁盘......
还使用TEXT,INT测试..

附加信息:


问题:

  • 代码中有什么问题吗?如果我发送100000个字符串来构建最终的SQL字符串,则结果SELECT squery;为NULL字符串.发生了什么?(错误必须在那里,但我没有看到它).
  • 我可以以任何方式改进代码以加快速度吗?
  • 我已经阅读了存储过程中的一些操作可能真的很慢,我应该在C/Java/PHP中生成文件..并将其发送到mysql

    mysql -u mysqluser -p databasename <numbers.sql

  • MySQL似乎只使用一个核心进行单个SQL查询,nginx或其他数据库系统:多线程数据库,Cassandra,Redis,MongoDB ..)使用存储过程实现更好的性能,并为一个查询使用多个CPU?(因为我的单个查询仅使用总CPU的20%和大约150个线程).

更新:

pet*_*erm 5

不要在RDBMS中使用特别是那种规模的循环.

尝试使用查询快速填充1m行的表

INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
SELECT 1, 'a1', 100, 1, 500000, '2013-06-14 12:40:45'
  FROM
(
select a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + 1 N
from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) d
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) e
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) f
) t
Run Code Online (Sandbox Code Playgroud)

它在我的盒子(MacBook Pro 16GB RAM,2.6Ghz Intel Core i7)上完成了〜8秒

Query OK, 1000000 rows affected (7.63 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

UPDATE1现在是使用准备语句的存储过程的一个版本

DELIMITER $$
CREATE PROCEDURE `inputRowsNoRandom`(IN NumRows INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
    END WHILE;

    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;
Run Code Online (Sandbox Code Playgroud)

在约3分钟内完成:

mysql> CALL inputRowsNoRandom(1000000);
Query OK, 0 rows affected (2 min 51.57 sec)

感觉差异8秒对3分钟

UPDATE2为了加快速度,我们可以显式地使用事务并批量提交插入.所以这里是SP的改进版本.

DELIMITER $$
CREATE PROCEDURE inputRowsNoRandom1(IN NumRows BIGINT, IN BatchSize INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    START TRANSACTION;
    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
        IF i % BatchSize = 0 THEN 
            COMMIT;
            START TRANSACTION;
        END IF;
    END WHILE;
    COMMIT;
    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;
Run Code Online (Sandbox Code Playgroud)

不同批量的结果:

mysql> CALL inputRowsNoRandom1(1000000,1000);
Query OK, 0 rows affected (27.25 sec)

mysql> CALL inputRowsNoRandom1(1000000,10000);
Query OK, 0 rows affected (26.76 sec)

mysql> CALL inputRowsNoRandom1(1000000,100000);
Query OK, 0 rows affected (26.43 sec)

你自己看到了不同之处.仍然比交叉连接差3倍.