Mar*_*tin 25 mysql performance optimization eav query-performance
我在谷歌上搜索、自我教育和寻找解决方案几个小时,但没有运气。我在这里发现了一些类似的问题,但不是这种情况。
我的表:
情况:
我尝试person_id从某些位置 ( location.attribute_value BETWEEN 3000 AND 7000) 中选择所有人员 ID ( ) ,具有某种性别 ( gender.attribute_value = 1),出生于某些年份 ( bornyear.attribute_value BETWEEN 1980 AND 2000) 并且具有某种眼睛颜色 ( eyecolor.attribute_value IN (2,3))。
这是我的查询女巫花了3~4 分钟。我想优化:
SELECT person_id
FROM person
LEFT JOIN attribute location ON location.attribute_type_id = 1 AND location.person_id = person.person_id
LEFT JOIN attribute gender ON gender.attribute_type_id = 2 AND gender.person_id = person.person_id
LEFT JOIN attribute bornyear ON bornyear.attribute_type_id = 3 AND bornyear.person_id = person.person_id
LEFT JOIN attribute eyecolor ON eyecolor.attribute_type_id = 4 AND eyecolor.person_id = person.person_id
WHERE 1
AND location.attribute_value BETWEEN 3000 AND 7000
AND gender.attribute_value = 1
AND bornyear.attribute_value BETWEEN 1980 AND 2000
AND eyecolor.attribute_value IN (2,3)
LIMIT 100000;
Run Code Online (Sandbox Code Playgroud)
结果:
+-----------+
| person_id |
+-----------+
| 233 |
| 605 |
| ... |
| 8702599 |
| 8703617 |
+-----------+
100000 rows in set (3 min 42.77 sec)
Run Code Online (Sandbox Code Playgroud)
解释扩展:
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
| 1 | SIMPLE | bornyear | range | attribute_type_id,attribute_value,person_id | attribute_value | 5 | NULL | 1265229 | 100.00 | Using where |
| 1 | SIMPLE | location | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.bornyear.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | eyecolor | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.bornyear.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | gender | ref | attribute_type_id,attribute_value,person_id | person_id | 5 | test1.eyecolor.person_id | 4 | 100.00 | Using where |
| 1 | SIMPLE | person | eq_ref | PRIMARY | PRIMARY | 4 | test1.location.person_id | 1 | 100.00 | Using where; Using index |
+----+-------------+----------+--------+---------------------------------------------+-----------------+---------+--------------------------+---------+----------+--------------------------+
5 rows in set, 1 warning (0.02 sec)
Run Code Online (Sandbox Code Playgroud)
分析:
+------------------------------+-----------+
| Status | Duration |
+------------------------------+-----------+
| Sending data | 3.069452 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.968915 |
| Waiting for query cache lock | 0.000019 |
| Sending data | 3.042468 |
| Waiting for query cache lock | 0.000043 |
| Sending data | 3.264984 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.823919 |
| Waiting for query cache lock | 0.000038 |
| Sending data | 2.863903 |
| Waiting for query cache lock | 0.000014 |
| Sending data | 2.971079 |
| Waiting for query cache lock | 0.000020 |
| Sending data | 3.053197 |
| Waiting for query cache lock | 0.000087 |
| Sending data | 3.099053 |
| Waiting for query cache lock | 0.000035 |
| Sending data | 3.064186 |
| Waiting for query cache lock | 0.000017 |
| Sending data | 2.939404 |
| Waiting for query cache lock | 0.000018 |
| Sending data | 3.440288 |
| Waiting for query cache lock | 0.000086 |
| Sending data | 3.115798 |
| Waiting for query cache lock | 0.000068 |
| Sending data | 3.075427 |
| Waiting for query cache lock | 0.000072 |
| Sending data | 3.658319 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.335427 |
| Waiting for query cache lock | 0.000049 |
| Sending data | 3.319430 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.496563 |
| Waiting for query cache lock | 0.000029 |
| Sending data | 3.017041 |
| Waiting for query cache lock | 0.000032 |
| Sending data | 3.132841 |
| Waiting for query cache lock | 0.000050 |
| Sending data | 2.901310 |
| Waiting for query cache lock | 0.000016 |
| Sending data | 3.107269 |
| Waiting for query cache lock | 0.000062 |
| Sending data | 2.937373 |
| Waiting for query cache lock | 0.000016 |
| Sending data | 3.097082 |
| Waiting for query cache lock | 0.000261 |
| Sending data | 3.026108 |
| Waiting for query cache lock | 0.000026 |
| Sending data | 3.089760 |
| Waiting for query cache lock | 0.000041 |
| Sending data | 3.012763 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 3.069694 |
| Waiting for query cache lock | 0.000046 |
| Sending data | 3.591908 |
| Waiting for query cache lock | 0.000060 |
| Sending data | 3.526693 |
| Waiting for query cache lock | 0.000076 |
| Sending data | 3.772659 |
| Waiting for query cache lock | 0.000069 |
| Sending data | 3.346089 |
| Waiting for query cache lock | 0.000245 |
| Sending data | 3.300460 |
| Waiting for query cache lock | 0.000019 |
| Sending data | 3.135361 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 2.909447 |
| Waiting for query cache lock | 0.000039 |
| Sending data | 3.337561 |
| Waiting for query cache lock | 0.000140 |
| Sending data | 3.138180 |
| Waiting for query cache lock | 0.000090 |
| Sending data | 3.060687 |
| Waiting for query cache lock | 0.000085 |
| Sending data | 2.938677 |
| Waiting for query cache lock | 0.000041 |
| Sending data | 2.977974 |
| Waiting for query cache lock | 0.000872 |
| Sending data | 2.918640 |
| Waiting for query cache lock | 0.000036 |
| Sending data | 2.975842 |
| Waiting for query cache lock | 0.000051 |
| Sending data | 2.918988 |
| Waiting for query cache lock | 0.000021 |
| Sending data | 2.943810 |
| Waiting for query cache lock | 0.000061 |
| Sending data | 3.330211 |
| Waiting for query cache lock | 0.000025 |
| Sending data | 3.411236 |
| Waiting for query cache lock | 0.000023 |
| Sending data | 23.339035 |
| end | 0.000807 |
| query end | 0.000023 |
| closing tables | 0.000325 |
| freeing items | 0.001217 |
| logging slow query | 0.000007 |
| logging slow query | 0.000011 |
| cleaning up | 0.000104 |
+------------------------------+-----------+
100 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
表结构:
CREATE TABLE `attribute` (
`attribute_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`attribute_type_id` int(11) unsigned DEFAULT NULL,
`attribute_value` int(6) DEFAULT NULL,
`person_id` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`attribute_id`),
KEY `attribute_type_id` (`attribute_type_id`),
KEY `attribute_value` (`attribute_value`),
KEY `person_id` (`person_id`)
) ENGINE=MyISAM AUTO_INCREMENT=40000001 DEFAULT CHARSET=utf8;
CREATE TABLE `person` (
`person_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`person_name` text CHARACTER SET latin1,
PRIMARY KEY (`person_id`)
) ENGINE=MyISAM AUTO_INCREMENT=20000001 DEFAULT CHARSET=utf8;
Run Code Online (Sandbox Code Playgroud)
已在具有 SSD 和 1GB RAM 的 DigitalOcean 虚拟服务器上执行查询。
我认为数据库设计可能存在问题。你有什么建议可以更好地设计这种情况吗?还是只是为了调整上面的选择?
Ric*_*mes 12
选择一些要包含的属性person. 以几种组合索引它们——使用复合索引,而不是单列索引。
这基本上是摆脱 EAV-sucks-at-performance 的唯一出路,这就是您所处的位置。
这里有更多讨论:http : //mysql.rjweb.org/doc.php/eav 包括使用 JSON 而不是键值表的建议。
我希望我找到了一个足够的解决方案。它的灵感来自这篇文章。
ft_min_word_len=1(对于 MyISAM)[mysqld]和innodb_ft_min_token_size=1(对于 InnoDb)my.cnf文件,重新启动 mysql 服务。SELECT * FROM person_index WHERE MATCH(attribute_1) AGAINST("123 456 789" IN BOOLEAN MODE) LIMIT 1000where 123, 456a789是人们应该在 中关联的 ID attribute_1。此查询不到 1 秒。步骤 1. 使用全文索引创建表。InnoDb 支持 MySQL 5.7 的全文索引,所以如果你使用 5.5 或 5.6,你应该使用 MyISAM。FT 搜索有时甚至比 InnoDb 更快。
CREATE TABLE `person_attribute_ft` (
`person_id` int(11) NOT NULL,
`attr_1` text,
`attr_2` text,
`attr_3` text,
`attr_4` text,
PRIMARY KEY (`person_id`),
FULLTEXT KEY `attr_1` (`attr_1`),
FULLTEXT KEY `attr_2` (`attr_2`),
FULLTEXT KEY `attr_3` (`attr_3`),
FULLTEXT KEY `attr_4` (`attr_4`),
FULLTEXT KEY `attr_12` (`attr_1`,`attr_2`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)
步骤 2.从 EAV(实体-属性-值)表中插入数据。例如,有问题可以用 1 个简单的 SQL 来完成:
INSERT IGNORE INTO `person_attribute_ft`
SELECT
p.person_id,
(SELECT GROUP_CONCAT(a.attribute_value SEPARATOR ' ') FROM attribute a WHERE a.attribute_type_id = 1 AND a.person_id = p.person_id LIMIT 10) attr_1,
(SELECT GROUP_CONCAT(a.attribute_value SEPARATOR ' ') FROM attribute a WHERE a.attribute_type_id = 2 AND a.person_id = p.person_id LIMIT 10) attr_2,
(SELECT GROUP_CONCAT(a.attribute_value SEPARATOR ' ') FROM attribute a WHERE a.attribute_type_id = 3 AND a.person_id = p.person_id LIMIT 10) attr_3,
(SELECT GROUP_CONCAT(a.attribute_value SEPARATOR ' ') FROM attribute a WHERE a.attribute_type_id = 4 AND a.person_id = p.person_id LIMIT 10) attr_4
FROM person p
Run Code Online (Sandbox Code Playgroud)
结果应该是这样的:
mysql> select * from person_attribute_ft limit 10;
+-----------+--------+--------+--------+--------+
| person_id | attr_1 | attr_2 | attr_3 | attr_4 |
+-----------+--------+--------+--------+--------+
| 1 | 541 | 2 | 1927 | 3 |
| 2 | 2862 | 2 | 1939 | 4 |
| 3 | 6573 | 2 | 1904 | 2 |
| 4 | 2432 | 1 | 2005 | 2 |
| 5 | 2208 | 1 | 1995 | 4 |
| 6 | 8388 | 2 | 1973 | 1 |
| 7 | 107 | 2 | 1909 | 4 |
| 8 | 5161 | 1 | 2005 | 1 |
| 9 | 8022 | 2 | 1953 | 4 |
| 10 | 4801 | 2 | 1900 | 3 |
+-----------+--------+--------+--------+--------+
10 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
步骤 3.从表中选择查询,如下所示:
mysql> SELECT SQL_NO_CACHE *
-> FROM `person_attribute_ft`
-> WHERE 1 AND MATCH(attr_1) AGAINST ("3000 3001 3002 3003 3004 3005 3006 3007" IN BOOLEAN MODE)
-> AND MATCH(attr_2) AGAINST ("1" IN BOOLEAN MODE)
-> AND MATCH(attr_3) AGAINST ("1980 1981 1982 1983 1984" IN BOOLEAN MODE)
-> AND MATCH(attr_4) AGAINST ("2,3" IN BOOLEAN MODE)
-> LIMIT 10000;
+-----------+--------+--------+--------+--------+
| person_id | attr_1 | attr_2 | attr_3 | attr_4 |
+-----------+--------+--------+--------+--------+
| 12131 | 3002 | 1 | 1982 | 2 |
| 51315 | 3007 | 1 | 1984 | 2 |
| 147283 | 3001 | 1 | 1984 | 2 |
| 350086 | 3005 | 1 | 1982 | 3 |
| 423907 | 3004 | 1 | 1982 | 3 |
... many rows ...
| 9423907 | 3004 | 1 | 1982 | 3 |
| 9461892 | 3007 | 1 | 1982 | 2 |
| 9516361 | 3006 | 1 | 1980 | 2 |
| 9813933 | 3005 | 1 | 1982 | 2 |
| 9986892 | 3003 | 1 | 1981 | 2 |
+-----------+--------+--------+--------+--------+
90 rows in set (0.17 sec)
Run Code Online (Sandbox Code Playgroud)
查询选择所有行:
attr_1:3000, 3001, 3002, 3003, 3004, 3005, 3006 or 3007 1的attr_2(此列表示性别,所以如果在该溶液中定制的,它应该是smallint(1)简单的指数,等...)1980, 1981, 1982, 1983 or 1984在attr_32或3在attr_4结论:
我知道这个解决方案在许多情况下并不完美和理想,但可以用作 EAV 表设计的良好替代方案。
我希望它会帮助某人。