Flo*_*lus 19 mysql performance database-design uniqueidentifier
由于各种原因,从日志记录到延迟关联,我已经在我的系统中使用 UUID 有一段时间了。当我变得不那么天真时,我使用的格式发生了变化:
VARCHAR(255)
VARCHAR(36)
CHAR(36)
BINARY(16)
当我到达最后一个时BINARY(16)
,我开始将性能与基本的自动增量整数进行比较。测试和结果如下所示,但如果你只是想总结,表示INT AUTOINCREMENT
和BINARY(16) RANDOM
对数据相同的性能范围高达20万(该数据库已预先填充之前测试)。
我最初对使用 UUID 作为主键持怀疑态度,事实上我仍然如此,但是我看到这里有潜力创建一个可以同时使用两者的灵活数据库。尽管许多人强调两者的优点,但使用这两种数据类型抵消了哪些缺点?
PRIMARY INT
UNIQUE BINARY(16)
此类设置的用例将是表间关系的传统主键,唯一标识符用于系统间关系。
我本质上试图发现的是两种方法之间的效率差异。除了使用的四倍磁盘空间(在添加额外数据后可能在很大程度上可以忽略不计)之外,在我看来它们是相同的。
架构:
-- phpMyAdmin SQL Dump
-- version 4.0.10deb1
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Sep 22, 2015 at 10:54 AM
-- Server version: 5.5.44-0ubuntu0.14.04.1
-- PHP Version: 5.5.29-1+deb.sury.org~trusty+3
SET SQL_MODE = "NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
--
-- Database: `test`
--
-- --------------------------------------------------------
--
-- Table structure for table `with_2id`
--
CREATE TABLE `with_2id` (
`guidl` bigint(20) NOT NULL,
`guidr` bigint(20) NOT NULL,
`data` varchar(255) NOT NULL,
PRIMARY KEY (`guidl`,`guidr`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `with_guid`
--
CREATE TABLE `with_guid` (
`guid` binary(16) NOT NULL,
`data` varchar(255) NOT NULL,
PRIMARY KEY (`guid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `with_id`
--
CREATE TABLE `with_id` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`data` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=197687 ;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
Run Code Online (Sandbox Code Playgroud)
插入基准:
function benchmark_insert(PDO $pdo, $runs)
{
$data = 'Sample Data';
$insert1 = $pdo->prepare("INSERT INTO with_id (data) VALUES (:data)");
$insert1->bindParam(':data', $data);
$insert2 = $pdo->prepare("INSERT INTO with_guid (guid, data) VALUES (:guid, :data)");
$insert2->bindParam(':guid', $guid);
$insert2->bindParam(':data', $data);
$insert3 = $pdo->prepare("INSERT INTO with_2id (guidl, guidr, data) VALUES (:guidl, :guidr, :data)");
$insert3->bindParam(':guidl', $guidl);
$insert3->bindParam(':guidr', $guidr);
$insert3->bindParam(':data', $data);
$benchmark = array();
$time = time();
for ($i = 0; $i < $runs; $i++) {
$insert1->execute();
}
$benchmark[1] = 'INC ID: ' . (time() - $time);
$time = time();
for ($i = 0; $i < $runs; $i++) {
$guid = openssl_random_pseudo_bytes(16);
$insert2->execute();
}
$benchmark[2] = 'GUID: ' . (time() - $time);
$time = time();
for ($i = 0; $i < $runs; $i++) {
$guid = openssl_random_pseudo_bytes(16);
$guidl = unpack('q', substr($guid, 0, 8))[1];
$guidr = unpack('q', substr($guid, 8, 8))[1];
$insert3->execute();
}
$benchmark[3] = 'SPLIT GUID: ' . (time() - $time);
echo 'INSERTION' . PHP_EOL;
echo '=============================' . PHP_EOL;
echo $benchmark[1] . PHP_EOL;
echo $benchmark[2] . PHP_EOL;
echo $benchmark[3] . PHP_EOL . PHP_EOL;
}
Run Code Online (Sandbox Code Playgroud)
选择基准:
function benchmark_select(PDO $pdo, $runs) {
$select1 = $pdo->prepare("SELECT * FROM with_id WHERE id = :id");
$select1->bindParam(':id', $id);
$select2 = $pdo->prepare("SELECT * FROM with_guid WHERE guid = :guid");
$select2->bindParam(':guid', $guid);
$select3 = $pdo->prepare("SELECT * FROM with_2id WHERE guidl = :guidl AND guidr = :guidr");
$select3->bindParam(':guidl', $guidl);
$select3->bindParam(':guidr', $guidr);
$keys = array();
for ($i = 0; $i < $runs; $i++) {
$kguid = openssl_random_pseudo_bytes(16);
$kguidl = unpack('q', substr($kguid, 0, 8))[1];
$kguidr = unpack('q', substr($kguid, 8, 8))[1];
$kid = mt_rand(0, $runs);
$keys[] = array(
'guid' => $kguid,
'guidl' => $kguidl,
'guidr' => $kguidr,
'id' => $kid
);
}
$benchmark = array();
$time = time();
foreach ($keys as $key) {
$id = $key['id'];
$select1->execute();
$row = $select1->fetch(PDO::FETCH_ASSOC);
}
$benchmark[1] = 'INC ID: ' . (time() - $time);
$time = time();
foreach ($keys as $key) {
$guid = $key['guid'];
$select2->execute();
$row = $select2->fetch(PDO::FETCH_ASSOC);
}
$benchmark[2] = 'GUID: ' . (time() - $time);
$time = time();
foreach ($keys as $key) {
$guidl = $key['guidl'];
$guidr = $key['guidr'];
$select3->execute();
$row = $select3->fetch(PDO::FETCH_ASSOC);
}
$benchmark[3] = 'SPLIT GUID: ' . (time() - $time);
echo 'SELECTION' . PHP_EOL;
echo '=============================' . PHP_EOL;
echo $benchmark[1] . PHP_EOL;
echo $benchmark[2] . PHP_EOL;
echo $benchmark[3] . PHP_EOL . PHP_EOL;
}
Run Code Online (Sandbox Code Playgroud)
测试:
$pdo = new PDO('mysql:host=localhost;dbname=test', 'root', '');
benchmark_insert($pdo, 1000);
benchmark_select($pdo, 100000);
Run Code Online (Sandbox Code Playgroud)
结果:
INSERTION
=============================
INC ID: 3
GUID: 2
SPLIT GUID: 3
SELECTION
=============================
INC ID: 5
GUID: 5
SPLIT GUID: 6
Run Code Online (Sandbox Code Playgroud)
Ric*_*mes 17
UUID 是非常大的表的性能灾难。(200K 行不是“非常大”。)
当CHARCTER SET
utf8CHAR(36)
占用 108 个字节时,您的 #3 真的很糟糕! 更新:有ROW_FORMATs
这将保持 36。
UUID (GUID) 非常“随机”。在大表上将它们用作 UNIQUE 或 PRIMARY 键是非常低效的。这是因为每次INSERT
创建新的 UUID 或SELECT
按 UUID时都必须在表/索引之间跳转。当表/索引太大而无法放入缓存时(请参阅innodb_buffer_pool_size
,它必须小于 RAM,通常为 70%),“下一个”UUID 可能不会被缓存,因此磁盘命中速度较慢。当表/索引是缓存的 20 倍时,只有 1/20 (5%) 的命中被缓存——您受 I/O 限制。 概括:效率低下适用于任何“随机”访问——UUID / MD5 / RAND() / 等
所以,不要使用 UUID,除非
有关 UUID 的更多信息:http : //mysql.rjweb.org/doc.php/uuid (它包括在标准 36 字符UUIDs
和BINARY(16)
.)之间转换的函数。 更新:MySQL 8.0 有一个内置函数。
在同一个表中同时拥有 UNIQUEAUTO_INCREMENT
和UNIQUE
UUID 是一种浪费。
INSERT
发生时,所有独特/主键必须检查重复。PRIMARY KEY
.BINARY(16)
(16 字节)有点笨重(反对将其作为 PK 的论点),但还不错。用于比较: INT UNSIGNED
是 4 个字节,范围为 4 亿。 BIGINT
是 8 个字节。
斜体更新/等已于 2017 年 9 月添加;没有任何重要改变。
BINARY
. MySQL 8 甚至有一个功能可以做到这一点。此外,它(和我的博客)重新排列这些位,使 UUIDv1 具有 auto_increment 的时间特性。小智 6
'Rick James' 在接受的答案中说:
“在同一个表中同时拥有 UNIQUE AUTO_INCREMENT 和 UNIQUE UUID 是一种浪费”。
但是这个测试(我在我的机器上做的)显示了不同的事实。
例如:在测试 (T2) 中,我使用 (INT AUTOINCREMENT) PRIMARY 和 UNIQUE BINARY(16) 以及另一个字段作为标题制作表,然后我插入超过 160 万行,性能非常好,但进行了另一次测试 (T3)我做了同样的事情,但仅插入 300,000 行后结果很慢。
这是我的测试结果:
T1:
char(32) UNIQUE with auto increment int_id
after: 1,600,000
10 sec for inserting 1000 rows
select + (4.0)
size:500mb
T2:
binary(16) UNIQUE with auto increment int_id
after: 1,600,000
1 sec for inserting 1000 rows
select +++ (0.4)
size:350mb
T3:
binary(16) UNIQUE without auto increment int_id
after: 350,000
5 sec for inserting 1000 rows
select ++ (0.3)
size:118mb (~ for 1,600,000 will be 530mb)
T4:
auto increment int_id without binary(16) UNIQUE
++++
T5:
uuid_short() int_id without binary(16) UNIQUE
+++++*
Run Code Online (Sandbox Code Playgroud)
因此,带有自动增量 int_id 的 binary(16) UNIQUE 优于不带自动增量 int_id 的 binary(16) UNIQUE。
更新:
我再次进行了相同的测试并记录了更多细节。这是(T2)和(T3)之间的完整代码和结果比较,如上所述。
(T2) 创建 tbl2 (mysql):
CREATE TABLE test.tbl2 (
int_id INT(11) NOT NULL AUTO_INCREMENT,
rec_id BINARY(16) NOT NULL,
src_id BINARY(16) DEFAULT NULL,
rec_title VARCHAR(255) DEFAULT NULL,
PRIMARY KEY (int_id),
INDEX IDX_tbl1_src_id (src_id),
UNIQUE INDEX rec_id (rec_id)
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci;
Run Code Online (Sandbox Code Playgroud)
(T3) 创建 tbl3 (mysql):
CREATE TABLE test.tbl3 (
rec_id BINARY(16) NOT NULL,
src_id BINARY(16) DEFAULT NULL,
rec_title VARCHAR(255) DEFAULT NULL,
PRIMARY KEY (rec_id),
INDEX IDX_tbl1_src_id (src_id)
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci;
Run Code Online (Sandbox Code Playgroud)
这是完整的测试代码,它将 600,000 条记录插入到 tbl2 或 tbl3(vb.net 代码)中:
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim res As String = ""
Dim i As Integer = 0
Dim ii As Integer = 0
Dim iii As Integer = 0
Using cn As New SqlClient.SqlConnection
cn.ConnectionString = "Data Source=.\sql2008;Integrated Security=True;User Instance=False;MultipleActiveResultSets=True;Initial Catalog=sourcedb;"
cn.Open()
Using cmd As New SqlClient.SqlCommand
cmd.Connection = cn
cmd.CommandTimeout = 0
cmd.CommandText = "select recID, srcID, rectitle from textstbl order by ID ASC"
Using dr As SqlClient.SqlDataReader = cmd.ExecuteReader
Using mysqlcn As New MySql.Data.MySqlClient.MySqlConnection
mysqlcn.ConnectionString = "User Id=root;Host=localhost;Character Set=utf8;Pwd=1111;Database=test"
mysqlcn.Open()
Using MyCommand As New MySql.Data.MySqlClient.MySqlCommand
MyCommand.Connection = mysqlcn
MyCommand.CommandText = "insert into tbl3 (rec_id, src_id, rec_title) values (UNHEX(@rec_id), UNHEX(@src_id), @rec_title);"
Dim MParm1(2) As MySql.Data.MySqlClient.MySqlParameter
MParm1(0) = New MySql.Data.MySqlClient.MySqlParameter("@rec_id", MySql.Data.MySqlClient.MySqlDbType.String)
MParm1(1) = New MySql.Data.MySqlClient.MySqlParameter("@src_id", MySql.Data.MySqlClient.MySqlDbType.String)
MParm1(2) = New MySql.Data.MySqlClient.MySqlParameter("@rec_title", MySql.Data.MySqlClient.MySqlDbType.VarChar)
MyCommand.Parameters.AddRange(MParm1)
MyCommand.CommandTimeout = 0
Dim mytransaction As MySql.Data.MySqlClient.MySqlTransaction = mysqlcn.BeginTransaction()
MyCommand.Transaction = mytransaction
Dim sw As New Stopwatch
sw.Start()
While dr.Read
MParm1(0).Value = dr.GetValue(0).ToString.Replace("-", "")
MParm1(1).Value = EmptyStringToNullValue(dr.GetValue(1).ToString.Replace("-", ""))
MParm1(2).Value = gettitle(dr.GetValue(2).ToString)
MyCommand.ExecuteNonQuery()
i += 1
ii += 1
iii += 1
If i >= 1000 Then
i = 0
Dim ts As TimeSpan = sw.Elapsed
Me.Text = ii.ToString & " / " & ts.TotalSeconds
Select Case ii
Case 10000, 50000, 100000, 200000, 300000, 400000, 500000, 600000, 700000, 800000, 900000, 1000000
res &= "On " & FormatNumber(ii, 0) & ": last inserting 1000 records take: " & ts.TotalSeconds.ToString & " second." & vbCrLf
End Select
If ii >= 600000 Then GoTo 100
sw.Restart()
End If
If iii >= 5000 Then
iii = 0
mytransaction.Commit()
mytransaction = mysqlcn.BeginTransaction()
sw.Restart()
End If
End While
100:
mytransaction.Commit()
End Using
End Using
End Using
End Using
End Using
TextBox1.Text = res
MsgBox("Ok!")
End Sub
Public Function EmptyStringToNullValue(MyValue As Object) As Object
'On Error Resume Next
If MyValue Is Nothing Then Return DBNull.Value
If String.IsNullOrEmpty(MyValue.ToString.Trim) Then
Return DBNull.Value
Else
Return MyValue
End If
End Function
Private Function gettitle(p1 As String) As String
If p1.Length > 255 Then
Return p1.Substring(0, 255)
Else
Return p1
End If
End Function
End Class
Run Code Online (Sandbox Code Playgroud)
(T2) 的结果:
On 10,000: last inserting 1000 records take: 0.13709 second.
On 50,000: last inserting 1000 records take: 0.1772109 second.
On 100,000: last inserting 1000 records take: 0.1291394 second.
On 200,000: last inserting 1000 records take: 0.5793488 second.
On 300,000: last inserting 1000 records take: 0.1296427 second.
On 400,000: last inserting 1000 records take: 0.6938583 second.
On 500,000: last inserting 1000 records take: 0.2317799 second.
On 600,000: last inserting 1000 records take: 0.1271072 second.
~3 Minutes ONLY! to insert 600,000 records.
table size: 128 mb.
Run Code Online (Sandbox Code Playgroud)
(T3) 的结果:
On 10,000: last inserting 1000 records take: 0.1669595 second.
On 50,000: last inserting 1000 records take: 0.4198369 second.
On 100,000: last inserting 1000 records take: 0.1318155 second.
On 200,000: last inserting 1000 records take: 0.1979358 second.
On 300,000: last inserting 1000 records take: 1.5127482 second.
On 400,000: last inserting 1000 records take: 7.2757161 second.
On 500,000: last inserting 1000 records take: 14.3960671 second.
On 600,000: last inserting 1000 records take: 14.9412401 second.
~40 Minutes! to insert 600,000 records.
table size: 164 mb.
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
23273 次 |
最近记录: |