我有一个问题,找到一个快速的方式加入表看起来像这样:
mysql> explain geo_ip;
+--------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+-------+
| ip_start | varchar(32) | NO | | "" | |
| ip_end | varchar(32) | NO | | "" | |
| ip_num_start | int(64) unsigned | NO | PRI | 0 | |
| ip_num_end | int(64) unsigned | NO | | 0 | |
| country_code | varchar(3) | NO | | "" | |
| country_name | varchar(64) | NO | | "" | |
| ip_poly | geometry | NO | MUL | NULL | |
+--------------+------------------+------+-----+---------+-------+
mysql> explain entity_ip;
+------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+------+-----+---------+-------+
| entity_id | int(64) unsigned | NO | PRI | NULL | |
| ip_1 | tinyint(3) unsigned | NO | | NULL | |
| ip_2 | tinyint(3) unsigned | NO | | NULL | |
| ip_3 | tinyint(3) unsigned | NO | | NULL | |
| ip_4 | tinyint(3) unsigned | NO | | NULL | |
| ip_num | int(64) unsigned | NO | | 0 | |
| ip_poly | geometry | NO | MUL | NULL | |
+------------+---------------------+------+-----+---------+-------+
Run Code Online (Sandbox Code Playgroud)
请注意,我不想geo_ip一次只找到一个IP地址所需的行,我需要entity_ip LEFT JOIN geo_ip(或类似/模拟方式).
这就是我现在所拥有的(使用http://jcole.us/blog/archives/2007/11/24/on-efficiently-geo-referencing-ips-with-maxmind-geoip-and-mysql上建议的多边形-gis /):
mysql> EXPLAIN SELECT li.*, gi.country_code FROM entity_ip AS li
-> LEFT JOIN geo_ip AS gi ON
-> MBRCONTAINS(gi.`ip_poly`, li.`ip_poly`);
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| 1 | SIMPLE | li | ALL | NULL | NULL | NULL | NULL | 2470 | |
| 1 | SIMPLE | gi | ALL | ip_poly_index | NULL | NULL | NULL | 155183 | |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
mysql> SELECT li.*, gi.country_code FROM entity AS li LEFT JOIN geo_ip AS gi ON MBRCONTAINS(gi.`ip_poly`, li.`ip_poly`) limit 0, 20;
20 rows in set (2.22 sec)
Run Code Online (Sandbox Code Playgroud)
没有多边形
mysql> explain SELECT li.*, gi.country_code FROM entity_ip AS li LEFT JOIN geo_ip AS gi ON li.`ip_num` >= gi.`ip_num_start` AND li.`ip_num` <= gi.`ip_num_end` LIMIT 0,20;
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+
| 1 | SIMPLE | li | ALL | NULL | NULL | NULL | NULL | 2470 | |
| 1 | SIMPLE | gi | ALL | PRIMARY,geo_ip,geo_ip_end | NULL | NULL | NULL | 155183 | |
+----+-------------+-------+------+---------------------------+------+---------+------+--------+-------+
mysql> SELECT li.*, gi.country_code FROM entity_ip AS li LEFT JOIN geo_ip AS gi ON li.ip_num BETWEEN gi.ip_num_start AND gi.ip_num_end limit 0, 20;
20 rows in set (2.00 sec)
Run Code Online (Sandbox Code Playgroud)
(在搜索中的行数较多时 - 没有区别)
目前我无法从这些查询中获得更快的性能,因为每个IP 0.1秒对我来说太慢了.
有没有办法让它更快?
这种方法存在一些可扩展性问题(如果您选择转移到特定于城市的地理数据),但对于给定大小的数据,它将提供相当大的优化.
您面临的问题实际上是MySQL不能很好地优化基于范围的查询.理想情况下,您希望对索引进行精确("=")查找而不是"大于",因此我们需要根据您可用的数据构建类似索引.这样,MySQL在查找匹配时会有更少的行进行评估.
为此,我建议您创建一个查找表,该表根据IP地址的第一个八位字节(= 1.2.3.4中的1)来索引地理位置表.我们的想法是,对于您必须执行的每次查找,您可以忽略所有不以与您要查找的IP相同的八位字节开头的地理位置IP.
CREATE TABLE `ip_geolocation_lookup` (
`first_octet` int(10) unsigned NOT NULL DEFAULT '0',
`ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0',
`ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0',
KEY `first_octet` (`first_octet`,`ip_numeric_start`,`ip_numeric_end`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Run Code Online (Sandbox Code Playgroud)
接下来,我们需要获取地理位置表中可用的数据,并生成覆盖地理定位行所涵盖的所有(第一个)八位字节的数据:如果您有一个带ip_start = '5.3.0.0'和的条目ip_end = '8.16.0.0',则查找表将需要八位字节5,6,7的行,那么......
ip_geolocation
|ip_start |ip_end |ip_numeric_start|ip_numeric_end|
|72.255.119.248 |74.3.127.255 |1224701944 |1241743359 |
Run Code Online (Sandbox Code Playgroud)
应转换为:
ip_geolocation_lookup
|first_octet|ip_numeric_start|ip_numeric_end|
|72 |1224701944 |1241743359 |
|73 |1224701944 |1241743359 |
|74 |1224701944 |1241743359 |
Run Code Online (Sandbox Code Playgroud)
由于此处有人要求提供原生MySQL解决方案,因此这里有一个存储过程可以为您生成数据:
DROP PROCEDURE IF EXISTS recalculate_ip_geolocation_lookup;
CREATE PROCEDURE recalculate_ip_geolocation_lookup()
BEGIN
DECLARE i INT DEFAULT 0;
DELETE FROM ip_geolocation_lookup;
WHILE i < 256 DO
INSERT INTO ip_geolocation_lookup (first_octet, ip_numeric_start, ip_numeric_end)
SELECT i, ip_numeric_start, ip_numeric_end FROM ip_geolocation WHERE
( ip_numeric_start & 0xFF000000 ) >> 24 <= i AND
( ip_numeric_end & 0xFF000000 ) >> 24 >= i;
SET i = i + 1;
END WHILE;
END;
Run Code Online (Sandbox Code Playgroud)
然后,您需要通过调用该存储过程来填充表:
CALL recalculate_ip_geolocation_lookup();
Run Code Online (Sandbox Code Playgroud)
此时,您可以删除刚刚创建的过程 - 不再需要它,除非您想重新计算查找表.
查找表到位后,您只需将其集成到查询中,并确保在第一个八位字节查询.您对查询表的查询将满足两个条件:
因为第二步是在数据子集上执行的,所以它比对整个数据进行范围测试要快得多.这是此优化策略的关键.
有多种方法可以确定IP地址的第一个八位字节是什么; 我使用,( r.ip_numeric & 0xFF000000 ) >> 24因为我的源IP是数字形式:
SELECT
r.*,
g.country_code
FROM
ip_geolocation g,
ip_geolocation_lookup l,
ip_random r
WHERE
l.first_octet = ( r.ip_numeric & 0xFF000000 ) >> 24 AND
l.ip_numeric_start <= r.ip_numeric AND
l.ip_numeric_end >= r.ip_numeric AND
g.ip_numeric_start = l.ip_numeric_start;
Run Code Online (Sandbox Code Playgroud)
现在,诚然,我最后确实有点懒惰:ip_geolocation如果你让ip_geolocation_lookup表格也包含国家数据,你可以很容易地摆脱桌面.我猜测从这个查询中删除一个表会使它更快一些.
最后,这里是我在此响应中使用的另外两个表供参考,因为它们与您的表不同.不过,我确信它们是兼容的.
# This table contains the original geolocation data
CREATE TABLE `ip_geolocation` (
`ip_start` varchar(16) NOT NULL DEFAULT '',
`ip_end` varchar(16) NOT NULL DEFAULT '',
`ip_numeric_start` int(10) unsigned NOT NULL DEFAULT '0',
`ip_numeric_end` int(10) unsigned NOT NULL DEFAULT '0',
`country_code` varchar(3) NOT NULL DEFAULT '',
`country_name` varchar(64) NOT NULL DEFAULT '',
PRIMARY KEY (`ip_numeric_start`),
KEY `country_code` (`country_code`),
KEY `ip_start` (`ip_start`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
# This table simply holds random IP data that can be used for testing
CREATE TABLE `ip_random` (
`ip` varchar(16) NOT NULL DEFAULT '',
`ip_numeric` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2996 次 |
| 最近记录: |