RMa*_*his 6 mysql index datatypes range-types
确定 IP 是否包含在 CIDR 块中的最快方法是什么?
目前,每当我存储 CIDR 地址时,我还会为起始和结束 IP 地址创建两列。开始和结束 ip 地址已编入索引。如果我想查看哪个网络包含地址,那么我看起来where ip between start_ip and end_ip
似乎不太理想。
在我看来,我可以存储正确移位的数字,并且可以匹配类似移位的 IP 地址(@cidr 的情况下为 660510)...
select @cidr, inet_aton(substring_index(@cidr,'/',1))>>(32-substring_index(@cidr,'/',-1));
+---------------+-----------------------------------------------------------------------------+
| @cidr | inet_aton(substring_index(@cidr,'/',1))>>(32-substring_index(@cidr,'/',-1)) |
+---------------+-----------------------------------------------------------------------------+
| 10.20.30.0/24 | 660510 |
+---------------+-----------------------------------------------------------------------------+
1 row in set (0.00 sec)
set @ip:='10.20.30.40';
Query OK, 0 rows affected (0.00 sec)
select @ip, inet_aton(@ip)>>(32-substring_index(@cidr,'/',-1));
+-------------+----------------------------------------------------+
| @ip | inet_aton(@ip)>>(32-substring_index(@cidr,'/',-1)) |
+-------------+----------------------------------------------------+
| 10.20.30.40 | 660510 |
+-------------+----------------------------------------------------+
1 row in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
为了以索引方式从中受益,我需要知道子网掩码(要移动的位数)。否则,我要么系统地比较位移位(即,盲目位移每个可能的网络掩码(从 0 位到 24 位))。
我还有其他资源需要优化,但优化位于http://lite.ip2location.com/database/ip-asn的 IP2Location™ LITE IP-ASN 数据库将是一个概念证明。
桌子...
CREATE TABLE `ip2loc_asn` (
`asn` bigint(20) DEFAULT NULL,
`cidr` varchar(50) DEFAULT NULL,
`start_ip` bigint(20) DEFAULT NULL,
`end_ip` bigint(20) DEFAULT NULL,
`name` varchar(250) DEFAULT NULL,
KEY `ip2locasn_startip_endip` (`start_ip`,`end_ip`),
KEY `asn` (`asn`),
KEY `cidr` (`cidr`)
) ENGINE=MyISAM; -- table is recreated monthly, MyISAM is the perfect engine
Run Code Online (Sandbox Code Playgroud)
样本数据...
select * from ip2loc_asn limit 10;
+-------+--------------+----------+----------+-------------------------------+
| asn | cidr | start_ip | end_ip | name |
+-------+--------------+----------+----------+-------------------------------+
| 56203 | 1.0.4.0/24 | 16778240 | 16778495 | Big Red Group |
| 56203 | 1.0.5.0/24 | 16778496 | 16778751 | Big Red Group |
| 56203 | 1.0.6.0/24 | 16778752 | 16779007 | Big Red Group |
| 38803 | 1.0.7.0/24 | 16779008 | 16779263 | Goldenit Pty ltd Australia, A |
| 18144 | 1.0.64.0/18 | 16793600 | 16809983 | Energia Communications,Inc. |
| 9737 | 1.0.128.0/17 | 16809984 | 16842751 | TOT Public Company Limited |
| 9737 | 1.0.128.0/18 | 16809984 | 16826367 | TOT Public Company Limited |
| 9737 | 1.0.128.0/19 | 16809984 | 16818175 | TOT Public Company Limited |
| 23969 | 1.0.128.0/24 | 16809984 | 16810239 | TOT Public Company Limited |
| 23969 | 1.0.129.0/24 | 16810240 | 16810495 | TOT Public Company Limited |
+-------+--------------+----------+----------+-------------------------------+
10 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
网络掩码范围从 8 到 32 位...
select min(substring_index(cidr,'/',-1)+0), max(substring_index(cidr,'/',-1)+0) from ip2loc_asn;
+-------------------------------------+-------------------------------------+
| min(substring_index(cidr,'/',-1)+0) | max(substring_index(cidr,'/',-1)+0) |
+-------------------------------------+-------------------------------------+
| 8 | 32 |
+-------------------------------------+-------------------------------------+
1 row in set (0.33 sec)
select * from ip2loc_asn where cidr like '%/8' limit 1;
+------+-----------+----------+----------+------------------------------+
| asn | cidr | start_ip | end_ip | name |
+------+-----------+----------+----------+------------------------------+
| 3356 | 4.0.0.0/8 | 67108864 | 83886079 | Level 3 Communications, Inc. |
+------+-----------+----------+----------+------------------------------+
1 row in set (0.00 sec)
select * from ip2loc_asn where cidr like '%/32' limit 1;
+-------+---------------+-----------+-----------+------+
| asn | cidr | start_ip | end_ip | name |
+-------+---------------+-----------+-----------+------+
| 51964 | 57.72.27.1/32 | 961026817 | 961026817 | |
+-------+---------------+-----------+-----------+------+
1 row in set (0.02 sec)
Run Code Online (Sandbox Code Playgroud)
当前执行计划...
explain select * from ip2loc_asn where inet_aton('10.20.30.40') between start_ip and end_ip;
+----+-------------+------------+-------+--------------------------+--------------------------+---------+------+-------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+--------------------------+--------------------------+---------+------+-------+-----------------------+
| 1 | SIMPLE | ip2loc_asn | range | ip2loc_asn_startip_endip | ip2loc_asn_startip_endip | 9 | NULL | 10006 | Using index condition |
+----+-------------+------------+-------+--------------------------+--------------------------+---------+------+-------+-----------------------+
1 row in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
我笨拙的尝试...
mysql to3_reference> alter table ip2loc_asn add column shifted_netmask int(10) unsigned;
Query OK, 626695 rows affected (4.06 sec)
Records: 626695 Duplicates: 0 Warnings: 0
mysql to3_reference> update ip2loc_asn set shifted_netmask = start_ip>>(32-substring_index(cidr,'/',-1));
Query OK, 626695 rows affected (5.98 sec)
Rows matched: 626695 Changed: 626695 Warnings: 0
mysql to3_reference> alter table ip2loc_asn add key ip2loc_asn_shiftednetmask (shifted_netmask);
Query OK, 626695 rows affected (5.83 sec)
Records: 626695 Duplicates: 0 Warnings: 0
Run Code Online (Sandbox Code Playgroud)
旧方式:
select * from ip2loc_asn where inet_aton('8.8.8.0') between start_ip and end_ip;
+-------+------------+-----------------+--------------+-----------+-----------+------------------------------+
| asn | cidr | shifted_netmask | netmask_bits | start_ip | end_ip | name |
+-------+------------+-----------------+--------------+-----------+-----------+------------------------------+
| 3356 | 8.0.0.0/9 | 16 | 9 | 134217728 | 142606335 | Level 3 Communications, Inc. |
| 3356 | 8.0.0.0/8 | 8 | 8 | 134217728 | 150994943 | Level 3 Communications, Inc. |
| 15169 | 8.8.8.0/24 | 526344 | 24 | 134744064 | 134744319 | Google Inc. |
+-------+------------+-----------------+--------------+-----------+-----------+- -----------------------------+
3 rows in set (0.00 sec)
Run Code Online (Sandbox Code Playgroud)
一种使用 shift_netmask 的方法(不可取 - 我正在进行全表扫描以发现网络掩码中的位数)...
select * from ip2loc_asn where shifted_netmask = inet_aton('8.8.8.0')>>32-netmask_bits;
+-------+------------+-----------------+--------------+-----------+-----------+------------------------------+
| asn | cidr | shifted_netmask | netmask_bits | start_ip | end_ip | name |
+-------+------------+-----------------+--------------+-----------+-----------+------------------------------+
| 3356 | 8.0.0.0/8 | 8 | 8 | 134217728 | 150994943 | Level 3 Communications, Inc. |
| 3356 | 8.0.0.0/9 | 16 | 9 | 134217728 | 142606335 | Level 3 Communications, Inc. |
| 15169 | 8.8.8.0/24 | 526344 | 24 | 134744064 | 134744319 | Google Inc. |
+-------+------------+-----------------+--------------+-----------+-----------+------------------------------+
3 rows in set (0.64 sec)
Run Code Online (Sandbox Code Playgroud)
所需的方法类似于最后一个查询减去对网络掩码位的扫描。
作为旁注 PostgreSQL,使用cidr
和inet
types 来实现这一点。如果你真的想让这份工作成为一流的ip4r
在我看来,我可以存储正确移位的数字,并且可以匹配类似移位的 IP 地址(@cidr 的情况下为 660510)...
好主意,这实际上是 PostgreSQL 在内部存储它们的方式。轻松搞定,
CREATE TABLE ip2loc_asn (
asn bigint,
cidr cidr,
name text
);
CREATE INDEX ON ip2loc_asn USING gist(cidr);
INSERT INTO ip2loc_asn(asn,cidr,name)
VALUES
( 56203, '1.0.4.0/24' , 'Big Red Group' ),
( 56203, '1.0.5.0/24' , 'Big Red Group' ),
( 56203, '1.0.6.0/24' , 'Big Red Group' ),
( 38803, '1.0.7.0/24' , 'Goldenit Pty ltd Australia, A' ),
( 18144, '1.0.64.0/18' , 'Energia Communications,Inc.' ),
( 9737, '1.0.128.0/17' , 'TOT Public Company Limited' ),
( 9737, '1.0.128.0/18' , 'TOT Public Company Limited' ),
( 9737, '1.0.128.0/19' , 'TOT Public Company Limited' ),
( 23969, '1.0.128.0/24' , 'TOT Public Company Limited' ),
( 23969, '1.0.129.0/24' , 'TOT Public Company Limited' );
Run Code Online (Sandbox Code Playgroud)
现在我们可以使用网络类型运算符查询它
test=# SELECT * FROM ip2loc_asn WHERE cidr >> '1.0.129.0';
asn | cidr | name
-------+--------------+----------------------------
9737 | 1.0.128.0/17 | TOT Public Company Limited
9737 | 1.0.128.0/18 | TOT Public Company Limited
9737 | 1.0.128.0/19 | TOT Public Company Limited
23969 | 1.0.129.0/24 | TOT Public Company Limited
Run Code Online (Sandbox Code Playgroud)
这也发生在索引上。
主要问题是优化器不知道是否有一对或一组匹配的起始端。因此,任何优化尝试都会被表扫描或至少大范围扫描所困扰。
你必须从哪一个开始?IP 地址?或者 CIDR 块?我这么问是因为我们可能需要重新排列您开始使用的数据,以便有效地查找其他数据。
在本文中,我将解释如何构建和维护所有 2^32(或 IPv6 等效)IP 地址的表。它仅使用一start_ip
列,并end_ip
从下一行推断。这意味着所有未分配的 IP 范围必须在表中具有一行。(这并不是一个很大的负担,至多使行数增加一倍。)这样一来,几乎所有操作本质上都是 O(1) —— 也就是说,类似于WHERE ip >= start_ip ORDER BY start_ip DESC LIMIT 1
“立即”得到答案。无表扫描、无范围扫描;没有什么比“点查询”(有效)更糟糕的了。请注意,它甚至不需要测试 end_ip。 警告:不处理重叠范围。 某些应用程序(可能不是您的)可以调整为不需要重叠。
如何使其适应 CIDR?一种方法是将您的 CIDR 表转换为我的变体。您熟悉如何做到这一点;主要区别是缺少 end_ip 和添加“无主”范围。因此,如果您“从”CIDR 开始并需要查找 IP,那么这是一个可能的答案。
归档时间: |
|
查看次数: |
4921 次 |
最近记录: |