从VARCHAR列中删除一些数据后,为什么MySQL MyISAM表的大小相同?

Les*_*zek 5 mysql varchar myisam size-reduction

我需要减小MySQL数据库的大小.我记录了一些条纹';'的信息.和sources列中的':' (减少约10%).执行此操作后,表的大小与之前完全相同.这怎么可能?我正在使用MyISAM引擎.

顺便说一句:不幸的是,我无法压缩表格myisampack.

mysql> INSERT INTO test SELECT protid1, protid2, CS, REPLACE(REPLACE(sources, ':', ''), ';', '') FROM homologs_9606; 
Query OK, 41917131 rows affected (4 min 11.30 sec)
Records: 41917131  Duplicates: 0  Warnings: 0

mysql> select TABLE_NAME name, ROUND(TABLE_ROWS/1e6, 3) 'million rows', ROUND(DATA_LENGTH/power(2,30), 3) 'data GB', ROUND(INDEX_LENGTH/power(2,30), 3) 'index GB' from information_schema.TABLES WHERE TABLE_NAME IN ('homologs_9606', 'test') ORDER BY TABLE_ROWS DESC LIMIT 10;
+---------------+--------------+---------+----------+
| name          | million rows | data GB | index GB |
+---------------+--------------+---------+----------+
| test          |       41.917 |   0.857 |    1.075 |
| homologs_9606 |       41.917 |   0.887 |    1.075 |
+---------------+--------------+---------+----------+
2 rows in set (0.01 sec)

mysql> select * from homologs_9606 limit 10;
+---------+---------+-------+--------------------------------+
| protid1 | protid2 | CS    | sources                        |
+---------+---------+-------+--------------------------------+
| 5635338 | 1028608 | 0.000 | 10:,1                          |
| 5644385 | 1028611 | 0.947 | 5:1,1;8:0.943,35;10:1,1;11:1,1 |
| 5652325 | 1028611 | 0.947 | 5:1,1;8:0.943,35;10:1,1;11:1,1 |
| 5641128 | 1028612 | 1.000 | 8:1,10                         |
| 5636414 | 1028616 | 0.038 | 8:0.038,104;10:,1              |
| 5636557 | 1028616 | 0.000 | 8:,4                           |
| 5637419 | 1028616 | 0.011 | 5:,1;8:0.011,91;10:,1          |
| 5641196 | 1028616 | 0.080 | 5:1,1;8:0.074,94;10:,1;11:,4   |
| 5642914 | 1028616 | 0.000 | 8:,3                           |
| 5643778 | 1028616 | 0.056 | 8:0.057,70;10:,1               |
+---------+---------+-------+--------------------------------+
10 rows in set (4.55 sec)

mysql> select * from test limit 10;
+---------+---------+-------+-------------------------+
| protid1 | protid2 | CS    | sources                 |
+---------+---------+-------+-------------------------+
| 5635338 | 1028608 | 0.000 | 10,1                    |
| 5644385 | 1028611 | 0.947 | 51,180.943,35101,1111,1 |
| 5652325 | 1028611 | 0.947 | 51,180.943,35101,1111,1 |
| 5641128 | 1028612 | 1.000 | 81,10                   |
| 5636414 | 1028616 | 0.038 | 80.038,10410,1          |
| 5636557 | 1028616 | 0.000 | 8,4                     |
| 5637419 | 1028616 | 0.011 | 5,180.011,9110,1        |
| 5641196 | 1028616 | 0.080 | 51,180.074,9410,111,4   |
| 5642914 | 1028616 | 0.000 | 8,3                     |
| 5643778 | 1028616 | 0.056 | 80.057,7010,1           |
+---------+---------+-------+-------------------------+
10 rows in set (0.00 sec)

mysql> describe test;
+---------+------------------+------+-----+---------+-------+
| Field   | Type             | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| protid1 | int(10) unsigned | YES  | PRI | NULL    |       |
| protid2 | int(10) unsigned | YES  | PRI | NULL    |       |
| CS      | float(4,3)       | YES  |     | NULL    |       |
| sources | varchar(100)     | YES  |     | NULL    |       |
+---------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> describe homologs_9606;
+---------+------------------+------+-----+---------+-------+
| Field   | Type             | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| protid1 | int(10) unsigned | NO   | PRI | 0       |       |
| protid2 | int(10) unsigned | NO   | PRI | 0       |       |
| CS      | float(4,3)       | YES  |     | NULL    |       |
| sources | varchar(100)     | YES  |     | NULL    |       |
+---------+------------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)

EDIT1:增加了平均列长度.

mysql> select AVG(LENGTH(sources)) from test; 
+----------------------+
| AVG(LENGTH(sources)) |
+----------------------+
|               5.2177 |
+----------------------+
1 row in set (10.04 sec)

mysql> select AVG(LENGTH(sources)) from homologs_9606; 
+----------------------+
| AVG(LENGTH(sources)) |
+----------------------+
|               6.8792 |
+----------------------+
1 row in set (9.95 sec)
Run Code Online (Sandbox Code Playgroud)

EDIT2:我可以通过设置NOT NULL所有列来删除更多的MB .

mysql> drop table test
Query OK, 0 rows affected (0.42 sec)

mysql> CREATE table test (protid1 INT UNSIGNED NOT NULL DEFAULT '0', protid2 INT UNSIGNED NOT NULL DEFAULT '0', CS FLOAT(4,3) NOT NULL DEFAULT '0', sources VARCHAR(100) NOT NULL DEFAULT '0', PRIMARY KEY (protid1, protid2), KEY `idx_protid2` (protid2)) ENGINE=MyISAM CHARSET=ascii;
Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO test SELECT protid1, protid2, CS, REPLACE(REPLACE(sources, ':', ''), ';', '') FROM homologs_9606; 
Query OK, 41917131 rows affected (2 min 7.84 sec)

mysql> select TABLE_NAME name, ROUND(TABLE_ROWS/1e6, 3) 'million rows', ROUND(DATA_LENGTH/power(2,30), 3) 'data GB', ROUND(INDEX_LENGTH/power(2,30), 3) 'index GB' from information_schema.TABLES WHERE TABLE_NAME IN ('homologs_9606', 'test');
Records: 41917131  Duplicates: 0  Warnings: 0

+---------------+--------------+---------+----------+
| name          | million rows | data GB | index GB |
+---------------+--------------+---------+----------+
| homologs_9606 |       41.917 |   0.887 |    1.075 |
| test          |       41.917 |   0.842 |    1.075 |
+---------------+--------------+---------+----------+
2 rows in set (0.02 sec)
Run Code Online (Sandbox Code Playgroud)

Emi*_*röm 2

它们并不完全相同。您的查询清楚地表明它test比以下文件小约 30 MB homologs_9606

\n\n
+---------------+--------------+---------+\n| name          | million rows | data GB |\n+---------------+--------------+---------+\n| test          |       41.917 |   0.857 | <-- 0.857 < 0.887\n| homologs_9606 |       41.917 |   0.887 |\n+---------------+--------------+---------+\n
Run Code Online (Sandbox Code Playgroud)\n\n

我们应该为您的桌子准备多少存储空间?让我们检查数据类型存储要求

\n\n
INTEGER(10): 4 bytes\nFLOAT(4): 4 bytes\nVARCHAR(100): L+1\n
Run Code Online (Sandbox Code Playgroud)\n\n

其中 L 是字符字节数,通常每个字符一个字节,但如果使用 Unicode 字符集,有时会更多。

\n\n

您的平均行数将需要:

\n\n
INTEGER + INTEGER + FLOAT + VARCHAR =\n4 + 4 + 4 + (L + 1) = L + 13 bytes\n
Run Code Online (Sandbox Code Playgroud)\n\n

我们可以将您的原始平均值 L 推断为(0.887*1024^3 / 41917131) - 13 = 9.72。你说你从 中剥离了 10% sources,这意味着你的新 L 是9.72*0.9 = 8.75。这给出了预期的新总存储需求((8.75 + 13) * 41917131) / 1024^3 = 0.849 GB

\n\n

我怀疑差异(0.849 和 0.857 之间)可能是由于test有两列设置为 NULLable 而homologs_9606没有,但我对 MyISAM 引擎了解不够,无法准确计算这一点。不过我可以猜!至少每行每列需要 1 位来存储NULL状态,在您的情况下意味着每行两位或2*41917131 = 83834262 bits = 10\xc2\xa0479\xc2\xa0283 bytes = 0.010 GB. 总0.849+0.010 = 0.859射数稍微超出目标(大约多出 2 MB)。但我做了一些四舍五入,你的 10% 数字也是一个估计值,所以我确信其余的都在翻译中丢失了。

\n\n

sources另一个原因可能是,如果您在 in上使用 Unicode 字符集test,在这种情况下,某些字符可能每个字符使用多个字节,但由于 NULLable 列似乎解释了所有内容,我认为您的表不会出现这种情况。

\n\n

概括

\n\n
    \n
  • 您的两个表大小不同,相差 30 MB。
  • \n
  • 新表的大小大约是预期的大小。
  • \n
  • protid1通过将和protid2放入列中,您可以在新表中节省更多空间NOT NULL
  • \n
\n