Bha*_*gav 10 mysql database database-design data-modeling
需要有关我的用例的数据模型的建议.我有两个参数来存储,A表示类型为T的东西,B表示类型为U的东西(它是T的集合)让我们说T类型的每个对象都有2个属性p1和p2,现在A =(t的计数与p1)/(t与p1的计数)+(t与p1的计数)
B =(A1 + A2 + ..)用于其T的集合/(U的T的数量).
现在,每当添加/修改类型为T的新对象时,我必须处理A和B的存储和更新.(几乎立即)
我已决定按如下方式处理A的计算,以维持一个像(T id,p1的编号,p2的编号)的表,从而每次数字改变时我只更新第2或第3列,我可以计算A在飞行中.但我很困惑如何优化B ??的计算 我最初的想法是在上面的表格上写一个触发器,这样每当有什么东西得到更新时,重新计算那个U对象的B,但我认为当我扩展时,这会给我带来很差的表现,有什么建议我可以在这做什么呢?
示例:假设U是具有多个块(T)的城市.现在,每个区块都会说p1个非veg餐厅和p2个veg.因此,每个区块的A将是p1 /(p1 + p2),并且每个城市的B将是该城市中的A1 + A2 + ../count(区块).如何为所有对象存储最初计算的A和B,这样当p1和p2不断变化时,我几乎需要立即更新A和B.
添加指标,以便更清楚地了解所需的解决方案,
延迟应为~100ms i,在p1/p2变化后应该可以使用A和B.
写入频率将为峰值,它将同时为100或1000次写入或3-5次.
使用您的城市/街区示例,您的架构可能类似于:
CREATE TABLE cities (
`city_id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
`country_id` TINYINT UNSIGNED NOT NULL,
`zip` VARCHAR(50) NOT NULL,
`name` VARCHAR(100) NOT NULL,
PRIMARY KEY (`city_id`)
);
CREATE TABLE blocks (
`block_id` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
`city_id` SMALLINT UNSIGNED NOT NULL,
`p1` SMALLINT UNSIGNED NOT NULL DEFAULT '0',
`p2` SMALLINT UNSIGNED NOT NULL DEFAULT '1',
PRIMARY KEY (`block_id`),
FOREIGN KEY (`city_id`) REFERENCES `cities` (`city_id`)
);
Run Code Online (Sandbox Code Playgroud)
您对给定城市(city_id = 123)的查询将是:
查询1
SELECT AVG(p1/(p1+p2)) AS B
FROM blocks b
WHERE b.city_id = 123
Run Code Online (Sandbox Code Playgroud)
注意: AVG(x) = SUM(x) / COUNT(x)
现在,如果您担心性能,您应该定义一些预期的数字:
如果已定义这些数字,则可以生成一些虚拟/假数据以对其运行性能测试.
以下是1000个城市和100K区块的示例(平均每个城市100个区块):
首先创建一个包含100K序列号的辅助表:
CREATE TABLE IF NOT EXISTS seq100k
SELECT NULL AS seq
FROM information_schema.COLUMNS c1
JOIN information_schema.COLUMNS c2
JOIN information_schema.COLUMNS c3
LIMIT 100000;
ALTER TABLE seq100k CHANGE COLUMN seq seq MEDIUMINT UNSIGNED AUTO_INCREMENT PRIMARY KEY;
Run Code Online (Sandbox Code Playgroud)
使用MariaDB,您可以使用序列插件.
生成数据:
DROP TABLE IF EXISTS blocks;
DROP TABLE IF EXISTS cities;
CREATE TABLE cities (
`city_id` SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
`country_id` TINYINT UNSIGNED NOT NULL,
`zip` VARCHAR(50) NOT NULL,
`name` VARCHAR(100) NOT NULL,
PRIMARY KEY (`city_id`)
)
SELECT seq AS city_id
, floor(rand(1)*10+1) as country_id
, floor(rand(2)*99999+1) as zip
, rand(3) as name
FROM seq100k
LIMIT 1000;
CREATE TABLE blocks (
`block_id` MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
`city_id` SMALLINT UNSIGNED NOT NULL,
`p1` SMALLINT UNSIGNED NOT NULL DEFAULT '0',
`p2` SMALLINT UNSIGNED NOT NULL DEFAULT '1',
PRIMARY KEY (`block_id`),
FOREIGN KEY (`city_id`) REFERENCES `cities` (`city_id`)
)
SELECT seq AS block_id
, floor(rand(4)*1000+1) as city_id
, floor(rand(5)*11) as p1
, floor(rand(6)*20+1) as p2
FROM seq100k
LIMIT 100000;
Run Code Online (Sandbox Code Playgroud)
现在您可以运行查询了.请注意,我不会使用完全运行时.如果您确切需要它们,则应使用分析.
运行查询1我的GUI(HeidiSQL)显示0.000 sec,我称之为"几乎即时".
您可能希望运行如下查询:
查询2
SELECT b.city_id, AVG(p1/(p1+p2)) AS B
FROM blocks b
GROUP BY b.city_id
ORDER BY B DESC
LIMIT 10
Run Code Online (Sandbox Code Playgroud)
HeidiSQL显示0.078 sec.
使用覆盖索引
ALTER TABLE `blocks`
DROP INDEX `city_id`,
ADD INDEX `city_id` (`city_id`, `p1`, `p2`);
Run Code Online (Sandbox Code Playgroud)
你可以减少运行时间0.031 sec.如果这还不够快,你应该考虑一些缓存策略.一种方法(除了应用程序级别的缓存)是使用触发器来管理cities表中的新列(让我们只调用它B):
ALTER TABLE `cities` ADD COLUMN `B` FLOAT NULL DEFAULT NULL AFTER `name`;
Run Code Online (Sandbox Code Playgroud)
定义更新触发器:
DROP TRIGGER IF EXISTS `blocks_after_update`;
DELIMITER //
CREATE TRIGGER `blocks_after_update` AFTER UPDATE ON `blocks` FOR EACH ROW BEGIN
if new.p1 <> old.p1 or new.p2 <> old.p2 then
update cities c
set c.B = (
select avg(p1/(p1+p2))
from blocks b
where b.city_id = new.city_id
)
where c.city_id = new.city_id;
end if;
END//
DELIMITER ;
Run Code Online (Sandbox Code Playgroud)
更新测试:
查询3
UPDATE blocks b SET p2 = p2 + 100 WHERE 1=1;
UPDATE blocks b SET p2 = p2 - 100 WHERE 1=1;
Run Code Online (Sandbox Code Playgroud)
此查询在2.500 sec没有触发器和60 sec触发器的情况下运行.这可能看起来像很多开销 - 但考虑一下,我们两次更新100K行 - 这意味着平均值60K msec / 200K updates = 0.3 msec/update.
现在,你可以得到相同的结果查询2与
查询4
SELECT c.city_id, c.B
FROM cities c
ORDER BY c.B DESC
LIMIT 10
Run Code Online (Sandbox Code Playgroud)
"几乎立刻"(0.000 sec).
如果需要,您仍然可以优化触发器.使用附加的列block_count在cities表(它也需要与触发器进行管理).
添加专栏:
ALTER TABLE `cities`
ADD COLUMN `block_count` MEDIUMINT UNSIGNED NOT NULL DEFAULT '0' AFTER `B`;
Run Code Online (Sandbox Code Playgroud)
初始数据:
UPDATE cities c SET c.block_count = (
SELECT COUNT(*)
FROM blocks b
WHERE b.city_id = c.city_id
)
WHERE 1=1;
Run Code Online (Sandbox Code Playgroud)
重写触发器:
DROP TRIGGER IF EXISTS `blocks_after_update`;
DELIMITER //
CREATE TRIGGER `blocks_after_update` AFTER UPDATE ON `blocks` FOR EACH ROW BEGIN
declare old_A, new_A double;
if new.p1 <> old.p1 or new.p2 <> old.p2 then
set old_A = old.p1/(old.p1+old.p2);
set new_A = new.p1/(new.p1+new.p2);
update cities c
set c.B = (c.B * c.block_count - old_A + new_A) / c.block_count
where c.city_id = new.city_id;
end if;
END//
DELIMITER ;
Run Code Online (Sandbox Code Playgroud)
使用此触发器,查询3现在可以运行8.5 sec.这意味着0.03 msec每次更新的开销.
请注意,您还需要定义INSERT和DELETE触发器.您需要添加更多逻辑(例如,处理city_id更新中的更改).但是你也可能根本不需要任何触发器.
| 归档时间: |
|
| 查看次数: |
263 次 |
| 最近记录: |