在5个最近的位置搜索邮政编码 - 我该怎么办?

flo*_*len 6 mysql sql innodb stored-procedures

我想要的是:

  1. 用户传递邮政编码或城市名称
  2. 我在我的数据库中搜索最近的5个位置
  3. 向用户显示该位置附近的5个最近位置

到目前为止我所拥有的:

让我们说一个包含以下内容的地方表:

(约16000行)

CREATE TABLE `locations` (
 `locationID` int(11) NOT NULL AUTO_INCREMENT,
 `name` varchar(150) NOT NULL,
 `firstname` varchar(100) DEFAULT NULL,
 `lastname` varchar(100) DEFAULT NULL,
 `street` varchar(100) NOT NULL,
 `city` varchar(100) NOT NULL,
 `state` varchar(100) NOT NULL,
 `zipcode` varchar(10) NOT NULL,
 `phone` varchar(20) NOT NULL,
 `web` varchar(255) DEFAULT NULL,
 `machine` enum('Unbekannt','Foo','Bar') DEFAULT 'Unbekannt',
 `surface` enum('Unbekannt','Foo','Bar','') DEFAULT 'Unbekannt',
 PRIMARY KEY (`locationID`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8
Run Code Online (Sandbox Code Playgroud)
  1. ID
  2. 名称
  3. 邮政编码

现在我有了世界上所有城镇的第二张桌子:

(约340万行)

CREATE TABLE `geoData` (
 `geoID` int(11) NOT NULL AUTO_INCREMENT,
 `countryCode` char(2) NOT NULL,
 `zipCode` varchar(20) NOT NULL,
 `name` varchar(180) NOT NULL,
 `state` varchar(100) NOT NULL,
 `stateCode` varchar(20) NOT NULL,
 `county` varchar(100) NOT NULL,
 `countyCode` varchar(20) NOT NULL,
 `community` varchar(100) NOT NULL,
 `communityCode` varchar(20) NOT NULL,
 `lat` mediumint(6) NOT NULL,
 `lon` mediumint(6) NOT NULL,
 PRIMARY KEY (`lon`,`lat`,`geoID`) USING BTREE,
 KEY `geoID` (`geoID`)
) ENGINE=InnoDB AUTO_INCREMENT=16482 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (lat)
(PARTITION p0 VALUES LESS THAN (-880000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (-860000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN (-840000) ENGINE = InnoDB,
PARTITION p3 VALUES LESS THAN (-820000) ENGINE = InnoDB,
PARTITION p4 VALUES LESS THAN (-800000) ENGINE = InnoDB,
PARTITION p5 VALUES LESS THAN (-780000) ENGINE = InnoDB,
PARTITION p6 VALUES LESS THAN (-760000) ENGINE = InnoDB,
PARTITION p7 VALUES LESS THAN (-740000) ENGINE = InnoDB,
PARTITION p8 VALUES LESS THAN (-720000) ENGINE = InnoDB,
PARTITION p9 VALUES LESS THAN (-700000) ENGINE = InnoDB,
PARTITION p10 VALUES LESS THAN (-680000) ENGINE = InnoDB,
PARTITION p11 VALUES LESS THAN (-660000) ENGINE = InnoDB,
PARTITION p12 VALUES LESS THAN (-640000) ENGINE = InnoDB,
PARTITION p13 VALUES LESS THAN (-620000) ENGINE = InnoDB,
PARTITION p14 VALUES LESS THAN (-600000) ENGINE = InnoDB,
PARTITION p15 VALUES LESS THAN (-580000) ENGINE = InnoDB,
PARTITION p16 VALUES LESS THAN (-560000) ENGINE = InnoDB,
PARTITION p17 VALUES LESS THAN (-540000) ENGINE = InnoDB,
PARTITION p18 VALUES LESS THAN (-520000) ENGINE = InnoDB,
PARTITION p19 VALUES LESS THAN (-500000) ENGINE = InnoDB,
PARTITION p20 VALUES LESS THAN (-480000) ENGINE = InnoDB,
PARTITION p21 VALUES LESS THAN (-460000) ENGINE = InnoDB,
PARTITION p22 VALUES LESS THAN (-440000) ENGINE = InnoDB,
PARTITION p23 VALUES LESS THAN (-420000) ENGINE = InnoDB,
PARTITION p24 VALUES LESS THAN (-400000) ENGINE = InnoDB,
PARTITION p25 VALUES LESS THAN (-380000) ENGINE = InnoDB,
PARTITION p26 VALUES LESS THAN (-360000) ENGINE = InnoDB,
PARTITION p27 VALUES LESS THAN (-340000) ENGINE = InnoDB,
PARTITION p28 VALUES LESS THAN (-320000) ENGINE = InnoDB,
PARTITION p29 VALUES LESS THAN (-300000) ENGINE = InnoDB,
PARTITION p30 VALUES LESS THAN (-280000) ENGINE = InnoDB,
PARTITION p31 VALUES LESS THAN (-260000) ENGINE = InnoDB,
PARTITION p32 VALUES LESS THAN (-240000) ENGINE = InnoDB,
PARTITION p33 VALUES LESS THAN (-220000) ENGINE = InnoDB,
PARTITION p34 VALUES LESS THAN (-200000) ENGINE = InnoDB,
PARTITION p35 VALUES LESS THAN (-180000) ENGINE = InnoDB,
PARTITION p36 VALUES LESS THAN (-160000) ENGINE = InnoDB,
PARTITION p37 VALUES LESS THAN (-140000) ENGINE = InnoDB,
PARTITION p38 VALUES LESS THAN (-120000) ENGINE = InnoDB,
PARTITION p39 VALUES LESS THAN (-100000) ENGINE = InnoDB,
PARTITION p40 VALUES LESS THAN (-80000) ENGINE = InnoDB,
PARTITION p41 VALUES LESS THAN (-60000) ENGINE = InnoDB,
PARTITION p42 VALUES LESS THAN (-40000) ENGINE = InnoDB,
PARTITION p43 VALUES LESS THAN (-20000) ENGINE = InnoDB,
PARTITION p44 VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION p45 VALUES LESS THAN (20000) ENGINE = InnoDB,
PARTITION p46 VALUES LESS THAN (40000) ENGINE = InnoDB,
PARTITION p47 VALUES LESS THAN (60000) ENGINE = InnoDB,
PARTITION p48 VALUES LESS THAN (80000) ENGINE = InnoDB,
PARTITION p49 VALUES LESS THAN (100000) ENGINE = InnoDB,
PARTITION p50 VALUES LESS THAN (120000) ENGINE = InnoDB,
PARTITION p51 VALUES LESS THAN (140000) ENGINE = InnoDB,
PARTITION p52 VALUES LESS THAN (160000) ENGINE = InnoDB,
PARTITION p53 VALUES LESS THAN (180000) ENGINE = InnoDB,
PARTITION p54 VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p55 VALUES LESS THAN (220000) ENGINE = InnoDB,
PARTITION p56 VALUES LESS THAN (240000) ENGINE = InnoDB,
PARTITION p57 VALUES LESS THAN (260000) ENGINE = InnoDB,
PARTITION p58 VALUES LESS THAN (280000) ENGINE = InnoDB,
PARTITION p59 VALUES LESS THAN (300000) ENGINE = InnoDB,
PARTITION p60 VALUES LESS THAN (320000) ENGINE = InnoDB,
PARTITION p61 VALUES LESS THAN (340000) ENGINE = InnoDB,
PARTITION p62 VALUES LESS THAN (360000) ENGINE = InnoDB,
PARTITION p63 VALUES LESS THAN (380000) ENGINE = InnoDB,
PARTITION p64 VALUES LESS THAN (400000) ENGINE = InnoDB,
PARTITION p65 VALUES LESS THAN (420000) ENGINE = InnoDB,
PARTITION p66 VALUES LESS THAN (440000) ENGINE = InnoDB,
PARTITION p67 VALUES LESS THAN (460000) ENGINE = InnoDB,
PARTITION p68 VALUES LESS THAN (480000) ENGINE = InnoDB,
PARTITION p69 VALUES LESS THAN (500000) ENGINE = InnoDB,
PARTITION p70 VALUES LESS THAN (520000) ENGINE = InnoDB,
PARTITION p71 VALUES LESS THAN (540000) ENGINE = InnoDB,
PARTITION p72 VALUES LESS THAN (560000) ENGINE = InnoDB,
PARTITION p73 VALUES LESS THAN (580000) ENGINE = InnoDB,
PARTITION p74 VALUES LESS THAN (600000) ENGINE = InnoDB,
PARTITION p75 VALUES LESS THAN (620000) ENGINE = InnoDB,
PARTITION p76 VALUES LESS THAN (640000) ENGINE = InnoDB,
PARTITION p77 VALUES LESS THAN (660000) ENGINE = InnoDB,
PARTITION p78 VALUES LESS THAN (680000) ENGINE = InnoDB,
PARTITION p79 VALUES LESS THAN (700000) ENGINE = InnoDB,
PARTITION p80 VALUES LESS THAN (720000) ENGINE = InnoDB,
PARTITION p81 VALUES LESS THAN (740000) ENGINE = InnoDB,
PARTITION p82 VALUES LESS THAN (760000) ENGINE = InnoDB,
PARTITION p83 VALUES LESS THAN (780000) ENGINE = InnoDB,
PARTITION p84 VALUES LESS THAN (800000) ENGINE = InnoDB,
PARTITION p85 VALUES LESS THAN (820000) ENGINE = InnoDB,
PARTITION p86 VALUES LESS THAN (840000) ENGINE = InnoDB,
PARTITION p87 VALUES LESS THAN (860000) ENGINE = InnoDB,
PARTITION p88 VALUES LESS THAN (880000) ENGINE = InnoDB,
PARTITION p89 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
Run Code Online (Sandbox Code Playgroud)
  1. ID
  2. 邮政编码
  3. 纬度
  4. 经度

基于此对文章和其他一些对这个问题我有是给我一个存储过程,读了ň点附近(经/纬度)最近城镇的位置/邮政编码.

我的存储过程:

    BEGIN
    DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000;

    SET @my_lat := _my_lat,
        @my_lon := _my_lon,
        @deg2dist := 0.0111325,  
        @start_deg := _start_dist / @deg2dist,  
        @max_deg := _max_dist / @deg2dist,
        @cutoff := @max_deg / SQRT(2),  
        @dlat := @start_deg,  
        @lon2lat := COS(_deg2rad * @my_lat),
        @iterations := 0;        

    SET @sql = CONCAT(
        "SELECT COUNT(*) INTO @near_ct
            FROM geoData
            WHERE lat    BETWEEN @my_lat - @dlat
                             AND @my_lat + @dlat   
              AND lon    BETWEEN @my_lon - @dlon
                             AND @my_lon + @dlon");
    PREPARE _sql FROM @sql;
    MainLoop: LOOP
        SET @iterations := @iterations + 1;
        SET @dlon := ABS(@dlat / @lon2lat);  
        SET @dlon := IF(ABS(@my_lat) + @dlat >= 900000, 3600001, @dlon);  
        EXECUTE _sql;
        IF ( @near_ct >= _limit OR         
             @dlat >= @cutoff ) THEN       
            LEAVE MainLoop;
        END IF;
        SET @dlat := LEAST(2 * @dlat, @cutoff);   
    END LOOP MainLoop;
    DEALLOCATE PREPARE _sql;

    SET @dlat := IF( @dlat >= @max_deg OR @dlon >= 1800000,
                @max_deg,
                GCDist(ABS(@my_lat), @my_lon,
                       ABS(@my_lat) - @dlat, @my_lon - @dlon) );
    SET @dlon := IFNULL(ASIN(SIN(_deg2rad * @dlat) /
                             COS(_deg2rad * @my_lat))
                            / _deg2rad 
                        , 3600001);    


    IF (ABS(@my_lon) + @dlon < 1800000 OR    
        ABS(@my_lat) + @dlat <  900000) THEN 
        SET @sql = CONCAT(
            "SELECT *,
                    @deg2dist * GCDist(@my_lat, @my_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat   
                  AND lon BETWEEN @my_lon - @dlon
                              AND @my_lon + @dlon   
                HAVING dist <= ", _max_dist, "
                ORDER BY dist
                LIMIT ", _limit
                        );
    ELSE
        SET @west_lon := IF(@my_lon < 0, @my_lon, @my_lon - 3600000);
        SET @east_lon := @west_lon + 3600000;
        SET @sql = CONCAT(
            "( SELECT *,
                    @deg2dist * GCDist(@my_lat, @west_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat 
                  AND lon BETWEEN @west_lon - @dlon
                              AND @west_lon + @dlon   
                HAVING dist <= ", _max_dist, " )
            UNION ALL
            ( SELECT *,
                    @deg2dist * GCDist(@my_lat, @east_lon, lat, lon) AS dist
                FROM geoData
                WHERE lat BETWEEN @my_lat - @dlat
                              AND @my_lat + @dlat   
                  AND lon BETWEEN @east_lon - @dlon
                              AND @east_lon + @dlon   
                HAVING dist <= ", _max_dist, " )
            ORDER BY dist
            LIMIT ", _limit
                        );
    END IF;

    PREPARE _sql FROM @sql;
    EXECUTE _sql;
    DEALLOCATE PREPARE _sql;
END
Run Code Online (Sandbox Code Playgroud)

我的问题:

我想传递邮政编码或城镇名称,然后从那里开始搜索.所以我的想法是我要求这些信息,并查看我的表格,从世界上所有的城镇/邮政编码.之后,如果只找到一个结果,我有lat/lon的信息,或者我要求用户在有多个结果的情况下选择正确的选择.

在那之后,我开始寻找靠近我当前位置的最近城镇.假设我想要一个包含50个城镇的列表.然后,我会去查看包含位置的表是否匹配5个结果.

再想一想,这听起来像个坏主意......

方法1:

我读了存储过程,sql和怪物查询,并尝试获取以下内容:

传递一个邮政编码/城市名称我会看起来,从巨大的表(可能作为mysql中的函数)拿走我的lat/lon,然后我会寻找最近的城镇并加入当时那里位置表并获取我最近的5个位置.

问题:

  • 如何避免为同一个城市/邮政编码的名称进行多次匹配?
  • 为了获得最近的5个位置,通过简单的连接是否可以这样做?

方法2:

获取我所在位置的所有lat/lon值,然后在此表上运行该过程.只需使用巨大的桌子来检索我当前的位置?

有了这个,我需要收集我所有位置的纬度/经度.但这可能是最好的方式.

但拥有所有城市/邮政编码的庞大数据库只是为了获得位置似乎有点矫枉过正.我希望有一种替代方案可能......不知怎的......

方法3

说实话,我想要的这个功能好像写了一百万次.那么我为什么要重新发明轮子?但我不知道如何找到合适的文章或书籍以实现我的目标.

你有没有其他人想要这样的最佳实践?

Ric*_*mes 6

首先是一些评论......

我在这里和其他论坛上看到过几十个(而不是数百万)的实现; 你的比大多数好.

根据一个数据来源(我碰巧下载了),世界上大约有320万个城市.

为了提高性能,您需要避免检查所有3M行.你已经开始使用不断增长的边界框了.请注意,你应该有

INDEX(lat, lon),
INDEX(lon, lat)
Run Code Online (Sandbox Code Playgroud)

优化器将在这些之间进行选择,并且第一个查询(带有COUNT(*))会将其视为"覆盖".它将是一个环绕地球的条纹或楔形; 超过3M行的明显改进.纬度最差(+34度)有96K个城市.(1度= 69英里/ 111公里.)对于十分之一度,34.4是最差的,有10K城市.

(是的,我喜欢这种数据拼图.)

并且,我看到你处理日期线和杆.我不认为你可以改进将它们作为一个特例.

(我只看了一下公式和常数.)

Geohash和Z-order索引帮助.但他们有一个打嗝,你需要检查目标周围的4个区域 - 尽管每个的第一个数字是不同的,但是没有意识到整数199999和200000彼此非常接近.

"用户传递邮政编码或城市名称" - 这是对两个简单表之一的点查询.(除了可能有重复 - 超过320个"圣何塞"和"圣安东尼奥".列表中相当远的地方是第一个非西班牙语名称:"维多利亚",只有144个城市.)

第二,我的实施...... (它与你的有一些相似之处.)

http://mysql.rjweb.org/doc.php/latlng

这通过使用PARTITIONing将边界框保持为大致正方形而不是条纹或楔形来改善性能.如果你正在寻找最接近的5,我的算法很少会触及超过几十行,并且这些行将在少量块中"聚集",从而保持磁盘命中数非常低.

在我的设计中,关键是在一个表中包含所有必需的列.一旦你找到了最近的5,你可以去其他桌子获得辅助的东西(电话号码等).

至于邮政编码,在开始搜索最近的5之前将它们转换为lat/lon.

算法内部的连接很可能会破坏性能.