Irf*_*gwb 14 mysql database database-administration
在优化order by和count查询时需要帮助,我有数百万(约3百万)行的表.
我必须连接4个表并获取记录,当我运行简单查询时,它只需要毫秒才能完成,但是当我尝试通过离开连接表来计数或排序时,它会无限期地停留.
请参阅以下案例.
CPU Number of virtual cores: 4
Memory(RAM): 16 GiB
Network Performance: High
Run Code Online (Sandbox Code Playgroud)
tbl_customers - #Rows: 20 million.
tbl_customers_address - #Row 25 million.
tbl_shop_setting - #Rows 50k
aio_customer_tracking - #Rows 5k
Run Code Online (Sandbox Code Playgroud)
CREATE TABLE `tbl_customers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
`shop_id` BIGINT(20) UNSIGNED NOT NULL,
`email` VARCHAR(225) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`accepts_marketing` TINYINT(1) NULL DEFAULT NULL,
`first_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`last_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`last_order_id` BIGINT(20) NULL DEFAULT NULL,
`total_spent` DECIMAL(12,2) NULL DEFAULT NULL,
`phone` VARCHAR(20) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`verified_email` TINYINT(4) NULL DEFAULT NULL,
`updated_at` DATETIME NULL DEFAULT NULL,
`created_at` DATETIME NULL DEFAULT NULL,
`date_updated` DATETIME NULL DEFAULT NULL,
`date_created` DATETIME NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `shopify_customer_id_unique` (`shopify_customer_id`),
INDEX `email` (`email`),
INDEX `shopify_customer_id` (`shopify_customer_id`),
INDEX `shop_id` (`shop_id`)
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;
CREATE TABLE `tbl_customers_address` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`customer_id` BIGINT(20) NULL DEFAULT NULL,
`shopify_address_id` BIGINT(20) NULL DEFAULT NULL,
`shopify_customer_id` BIGINT(20) NULL DEFAULT NULL,
`first_name` VARCHAR(50) NULL DEFAULT NULL,
`last_name` VARCHAR(50) NULL DEFAULT NULL,
`company` VARCHAR(50) NULL DEFAULT NULL,
`address1` VARCHAR(250) NULL DEFAULT NULL,
`address2` VARCHAR(250) NULL DEFAULT NULL,
`city` VARCHAR(50) NULL DEFAULT NULL,
`province` VARCHAR(50) NULL DEFAULT NULL,
`country` VARCHAR(50) NULL DEFAULT NULL,
`zip` VARCHAR(15) NULL DEFAULT NULL,
`phone` VARCHAR(20) NULL DEFAULT NULL,
`name` VARCHAR(50) NULL DEFAULT NULL,
`province_code` VARCHAR(5) NULL DEFAULT NULL,
`country_code` VARCHAR(5) NULL DEFAULT NULL,
`country_name` VARCHAR(50) NULL DEFAULT NULL,
`longitude` VARCHAR(250) NULL DEFAULT NULL,
`latitude` VARCHAR(250) NULL DEFAULT NULL,
`default` TINYINT(1) NULL DEFAULT NULL,
`is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `customer_id` (`customer_id`),
INDEX `shopify_address_id` (`shopify_address_id`),
INDEX `shopify_customer_id` (`shopify_customer_id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
CREATE TABLE `tbl_shop_setting` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`shop_name` VARCHAR(300) NOT NULL COLLATE 'latin1_swedish_ci',
PRIMARY KEY (`id`),
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;
CREATE TABLE `aio_customer_tracking` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
`email` VARCHAR(255) NULL DEFAULT NULL,
`shop_id` BIGINT(20) UNSIGNED NOT NULL,
`domain` VARCHAR(255) NULL DEFAULT NULL,
`web_session_count` INT(11) NOT NULL,
`last_seen_date` DATETIME NULL DEFAULT NULL,
`last_contact_date` DATETIME NULL DEFAULT NULL,
`last_email_open` DATETIME NULL DEFAULT NULL,
`created_date` DATETIME NOT NULL,
`is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `shopify_customer_id` (`shopify_customer_id`),
INDEX `email` (`email`),
INDEX `shopify_customer_id_shop_id` (`shopify_customer_id`, `shop_id`),
INDEX `last_seen_date` (`last_seen_date`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
Run Code Online (Sandbox Code Playgroud)
1. Running: Below query fetch the records by joining all the 4 tables, It takes only 0.300 ms.
SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20
2. Not running: Simply when try to get the count of these row stuk the query, I waited 10 min but still running.
SELECT
COUNT(DISTINCT c.shopify_customer_id) -- what makes #2 different
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20
3. Not running: In the #1 query we simply put the 1 Order by clause and it get stuck, I waited 10 min but still running. I study query optimization some article and tried by indexing, Right Join etc.. but still not working.
SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
ORDER BY `t`.`last_seen_date` -- what makes #3 different
LIMIT 20
Run Code Online (Sandbox Code Playgroud)
任何建议优化查询,表结构是受欢迎的.
tbl_customers
表包含客户信息,tbl_customer_address
表包含客户的地址(一个客户可能有多个地址),而aio_customer_tracking
表包含客户的访问记录last_seen_date
是访问日期.
现在,我只想用他们的地址和访问信息来获取和统计客户.此外,我可以通过这3个表中的任何一个列进行排序.在我的示例中,我按last_seen_date(默认顺序)排序.希望这个解释有助于理解我想要做的事情.
在查询#1中,而不是其他两个,优化器可以使用
UNIQUE INDEX `shopify_customer_id_unique` (`shopify_customer_id`)
Run Code Online (Sandbox Code Playgroud)
削减查询的简称
GROUP BY c.shopify_customer_id
LIMIT 20
Run Code Online (Sandbox Code Playgroud)
这是因为它可以在索引的20个项目后停止.查询不是超快的,因为派生表(子查询t
)命中大约51K行.
查询#2可能很慢,因为优化器无法注意到并删除了冗余DISTINCT
.相反,它可能会认为它不能在20后停止.
查询#3 必须完全通过表c
来获取每个 shopify_customer_id
组.这是因为ORDER BY
可以防止短暂的电流进入LIMIT 20
.
a中的列GROUP BY
必须包括SELECT
除列之外由列唯一定义的所有非聚合列.既然你已经说过一个地址可以有多个地址shopify_customer_id
,那么提取就不合适ca.address1
了GROUP BY shopify_customer_id
.同样,子查询似乎是不合适的last_seen_date, last_contact_date
.
在aio_customer_tracking
,这种变化(对于"覆盖"指数)可能有所帮助:
INDEX (`shopify_customer_id`)
Run Code Online (Sandbox Code Playgroud)
至
INDEX (`shopify_customer_id`, `last_seen_date`, `last_contact_date`)
Run Code Online (Sandbox Code Playgroud)
解剖目标
现在,我只想...计算客户数量
要计算客户数量,请执行此操作,但不要尝试将其与"提取"结合使用:
SELECT COUNT(*) FROM tbl_customers;
Run Code Online (Sandbox Code Playgroud)
现在,我只是想取...客户......
tbl_customers - #Rows:2000万.
当然你不想要获取2000万行!我不想考虑如何尝试这样做.请澄清.我不会接受通过这么多行的分页.也许有一个WHERE
条款?该WHERE
子句(通常)是优化中最重要的部分!
现在,简单地说,我想通过他们的地址和访问信息来获取客户.
假设WHERE
过滤到"少数"客户,然后JOINing
到另一个表以获得"任何"地址和"任何"访问信息,可能是有问题的和/或效率低的.要求"第一"或"最后"而不是"任何"将不会更容易,但可能更有意义.
我可以建议您的UI首先找到一些客户,然后如果用户想要,请转到包含所有地址和所有访问的另一个页面.或者访问量可以达到数百个还是更多?
此外,我可以通过这3个表中的任何一个列进行排序.在我的示例中,我按last_seen_date(默认顺序)排序.
让我们专注于优化WHERE
,然后last_seen_date
在任何索引的末尾添加.
归档时间: |
|
查看次数: |
515 次 |
最近记录: |