MySQL 8.0 升级后特定查询性能不佳

Wor*_*els 8 php mysql prepared-statement mysql-5.7 mysql-8.0

编辑:我在 Python 中看到与 PHP 相同的行为。好像和MySQL有关。

我们正在尝试从 MySQL 5.7 升级到 8.0。我们的代码库使用 PHP MySQLi 来查询 MySQL 服务器。在我们的测试设置中,我们发现绑定大量参数的某些查询的性能较差(慢 50 倍)。我们希望看到 MySQL 8.0 的运行时间与 5.7 相似。下面是示例表结构和故障查询。

CREATE TABLE IF NOT EXISTS `a` (
  `id` int NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `name` (`name`) USING BTREE,
  KEY `name_id` (`id`,`name`) USING BTREE
);

CREATE TABLE IF NOT EXISTS `b` (
  `id` int NOT NULL AUTO_INCREMENT,
  `a_id` int NOT NULL,
  `value` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `uniquevalue` (`a_id`,`value`) USING BTREE,
  KEY `a_id` (`a_id`) USING BTREE,
  KEY `v` (`value`) USING BTREE,
  CONSTRAINT `b_ibfk_1` FOREIGN KEY (`a_id`) REFERENCES `a` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);

CREATE TABLE IF NOT EXISTS `c` (
  `product` varchar(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
  `b_id` int NOT NULL,
  PRIMARY KEY (`product`,`b_id`) USING BTREE,
  KEY `b_id` (`b_id`),
  KEY `product` (`product`),
  CONSTRAINT `c_ibfk_2` FOREIGN KEY (`b_id`) REFERENCES `b` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
);
Run Code Online (Sandbox Code Playgroud)
-- example trouble query
SELECT c.product, a.name, b.value
FROM b
INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
-- this hash is from the dataset (linked below) but it should match a record in the 'a' table that has an associated record in the 'b' table that in turn has an associated record in the 'c' table
INNER JOIN c on c.b_id = b.id and c.product IN (?, ?, ?...) -- ... meaning dynamic number of parameters
Run Code Online (Sandbox Code Playgroud)

如果将查询修改为只返回一条记录(限制1),查询仍然很慢。所以这与返回的数据量无关。如果查询以非参数化方式运行(使用字符串连接),则查询运行时间在所有环境中都是可接受的。添加的参数越多,查询速度就越慢(线性)。对于 7,000 个绑定参数,查询在 MySQL 5.7 中运行时间为 100 - 150 毫秒,在 MySQL 8.0.28 中运行时间约为 10 秒。我们在 PHP 7.4 和 8.0 中看到相同的结果。我们使用 MySQLi 或 PDO 看到相同的结果。

这告诉我这与参数绑定有关。我启用了分析并检查了查询结果。查询的大部分时间 (~95%) 花费在执行步骤,而不是参数绑定步骤。另外,我看到 mysql 8 进程 CPU 在查询运行时被固定。我对这个很困惑。

这里是MySQL 8.0的解释。

ID 选择类型 桌子 分区 类型 可能的键 钥匙 密钥长度 参考 过滤的 额外的
1 简单的 A 常量 主要,名称,name_id 姓名 1022 常量 1 100 使用索引
1 简单的 C 参考 主要,b_id,产品 产品 152 常量 1 100 使用索引
1 简单的 等式引用 主要,唯一值,a_id 基本的 4 默认Web.c.b_id 1 5 使用地点

这里是MySQL 5.7的解释。

ID 选择类型 桌子 分区 类型 可能的键 钥匙 密钥长度 参考 过滤的 额外的
1 简单的 A 常量 主要,名称,name_id 姓名 第257章 常量 1 100 使用索引
1 简单的 C 参考 主要,b_id,产品 基本的 152 常量 1 100 使用索引
1 简单的 等式引用 主要,唯一值,a_id 基本的 4 默认Web.c.b_id 1 5 使用地点

这两种解释之间存在一些差异,但同样,这个问题仅发生在 PHP 中的准备好的语句中。

下面是一些演示该问题的 php 代码。编写此代码是为了针对我在下面的 Google Drive 链接中提供的数据集。我还将 MySQL 变量包含在 CSV 中。

<?php
// Modify these to fit your DB connection.
const HOST = '127.0.0.1';
const USER = 'root';
const PASS = 'localtest';
const DB_NAME = 'TestDatabase';

// As the number of parameters increases, time increases linearly.
// We're seeing ~10 seconds with 7000 params with this data.
const NUM_PARAMS = 7000;

function rand_string($length = 10) {
    $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    $charactersLength = strlen($characters);
    $randomString = '';
    for ($i = 0; $i < $length; $i++) {
        $randomString .= $characters[rand(0, $charactersLength - 1)];
    }
    return $randomString;
}

function sql_question_marks($count, $sets = 1) {
    return substr(str_repeat(",(".substr(str_repeat(",?", $count), 1).")", $sets), 1);
}

function unsecure_concat($params) {
    return "('" . implode("','", $params) . "')";
}

$params = [];
$param_types = '';
for ($i = 0; $i < NUM_PARAMS; $i++) {
    $params[] = rand_string();
    $param_types .= 's';
}

$big_query = <<<SQL
    SELECT c.product, a.name, b.value
    FROM b
    INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
    INNER JOIN c on c.b_id = b.id and c.product IN
SQL . sql_question_marks(count($params));

$non_parameterized = <<<SQL
    SELECT c.product, a.name, b.value
    FROM b
    INNER JOIN a ON b.a_id = a.id AND a.name IN ('1be6f9eb563f3bf85c78b4219bf09de9')
    INNER JOIN c on c.b_id = b.id and c.product IN
SQL . unsecure_concat($params);

$connection = new mysqli(HOST, USER, PASS, DB_NAME);

$q = $connection->prepare($big_query);
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo 'The total time for parameterized query is ' . $total_time . ' seconds.';

$q->get_result(); // not concerned with results.

$q = $connection->prepare($big_query . ' LIMIT 1');
$q->bind_param($param_types, ...$params);
$start_time = hrtime(true);
$q->execute(); // This one also shows the issue...100-250 ms execution time in MySQL 5.7 and ~10 seconds with 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo '<br>The total time for parameterized query with limit 1 is ' . $total_time . ' seconds.';

$q->get_result(); // not concerned with results 

$q = $connection->prepare($non_parameterized);
$start_time = hrtime(true);
$q->execute(); // Same execution time in 5.7 and 8.0.
$end_time = hrtime(true);

$total_time = ($end_time - $start_time) / 1000000000; // convert to seconds

echo '<br>The total time for non-parameterized query is ' . $total_time . ' seconds.';
Run Code Online (Sandbox Code Playgroud)

您可以在此处下载示例数据: https://drive.google.com/file/d/111T7g1NowfWO_uZ2AhT9jdj4LiSNck8u/view ?usp=sharing

编辑:这是带有 7,000 个绑定参数的 JSON 解释。

{
    "EXPLAIN": {
        "query_block": {
            "select_id": 1,
            "cost_info": {
                "query_cost": "456.60"
            },
            "nested_loop": [
                {
                    "table": {
                        "table_name": "a",
                        "access_type": "const",
                        "possible_keys": [
                            "PRIMARY",
                            "name",
                            "name_id"
                        ],
                        "key": "name",
                        "used_key_parts": [
                            "name"
                        ],
                        "key_length": "257",
                        "ref": [
                            "const"
                        ],
                        "rows_examined_per_scan": 1,
                        "rows_produced_per_join": 1,
                        "filtered": "100.00",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "0.00",
                            "eval_cost": "0.10",
                            "prefix_cost": "0.00",
                            "data_read_per_join": "264"
                        },
                        "used_columns": [
                            "id",
                            "name"
                        ]
                    }
                },
                {
                    "table": {
                        "table_name": "b",
                        "access_type": "ref",
                        "possible_keys": [
                            "PRIMARY",
                            "uniquevalue",
                            "a_id"
                        ],
                        "key": "uniquevalue",
                        "used_key_parts": [
                            "a_id"
                        ],
                        "key_length": "4",
                        "ref": [
                            "const"
                        ],
                        "rows_examined_per_scan": 87,
                        "rows_produced_per_join": 87,
                        "filtered": "100.00",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "8.44",
                            "eval_cost": "8.70",
                            "prefix_cost": "17.14",
                            "data_read_per_join": "65K"
                        },
                        "used_columns": [
                            "id",
                            "a_id",
                            "value"
                        ]
                    }
                },
                {
                    "table": {
                        "table_name": "c",
                        "access_type": "ref",
                        "possible_keys": [
                            "PRIMARY",
                            "b_id",
                            "product"
                        ],
                        "key": "b_id",
                        "used_key_parts": [
                            "b_id"
                        ],
                        "key_length": "4",
                        "ref": [
                            "TestDatabase.b.id"
                        ],
                        "rows_examined_per_scan": 35,
                        "rows_produced_per_join": 564,
                        "filtered": "18.28",
                        "using_index": true,
                        "cost_info": {
                            "read_cost": "130.53",
                            "eval_cost": "56.47",
                            "prefix_cost": "456.60",
                            "data_read_per_join": "88K"
                        },
                        "used_columns": [
                            "product",
                            "b_id"
                        ],
                        "attached_condition": "" // i've omitted the condition since it breaks the SO char limit, it contains 7,000 random character strings at 10 length each
                    }
                }
            ]
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

Ric*_*mes -2

b听起来像是一个键值表,这是一种低效的反模式。但我今天的观点是,“正常化”name会使情况变得更糟。表是c多对多映射表吗?

因此,摆脱表a并简单地将表放入name表中b

您应该删除一些冗余索引。

  • 在一个表中PRIMARY KEY(x),基本上没有必要INDEX(x, ...)
  • 使用INDEX(e, f), or UNIQUE(e, f) , there is no need to also have INDEX(e)`。

核心价值

CREATE TABLE FooAttributes (
    foo_id INT UNSIGNED NOT NULL,  -- link to main table
    key VARCHAR(..) NOT NULL,
    val ..., 
    PRIMARY KEY(foo_id, key),
    INDEX(key)    -- if needed
    INDEX(val)    -- if needed
) ENGINE=InnoDB;
Run Code Online (Sandbox Code Playgroud)

笔记:

  • 标准化“关键”值会减慢处理速度,但不会节省太多空间。
  • PK 旨在快速访问所需的行,因此,
  • 没有传统的id AUTO_INCREMENT.
  • val当您需要字符串或数字时,没有干净的解决方案。