地图聚类算法

Chr*_*s B 20 php algorithm performance google-maps cluster-analysis

我目前的代码很快,但我需要更快,所以我们可以容纳更多的标记.有什么建议?

笔记:

  • 当SQL语句按标记名称排序时,代码运行得最快 - 它本身在聚类标记方面做了非常局部的工作(同一位置的标记名称通常但不总是相似).
  • 我无法对标记进行预聚类,因为它们可以动态搜索和过滤.
  • 我已经尝试过基于网格的聚类 - 但结果往往不是很好.
  • 我知道墨卡托投影上的星团略微倾斜.
  • 我对商业集群服务不感兴趣.

代码:

$singleMarkers = array();
$clusterMarkers = array();

while (count($markers)) {
    $marker  = array_pop($markers);
    $cluster = array();

    // Compare marker against all remaining markers.
    foreach ($markers as $key => $compareMarker) {
        // This function returns the distance between two markers, at a defined zoom level.
        $pixels = pixelDistance($marker['lat'], $marker['lng'], $compareMarker['lat'], $compareMarker['lng'], $zoomLevel);
        // If two markers are closer than defined distance, remove compareMarker from array and add to cluster.
        if ($pixels < $distance) {
            unset($markers[$key]);
            $cluster[] = $compareMarker;
        }
    }

    // If a marker was added to cluster, also add the marker we were comparing to.
    if (count($cluster) > 0) {
        $cluster[] = $marker;
        $clusterMarkers[] = $cluster;
    } else {
        $singleMarkers[] = $marker;
    }
}

function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
    $x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
    $y1 = $lat1*10000000;
    $x2 = $lon2*10000000;
    $y2 = $lat2*10000000;

    return sqrt(pow(($x1-$x2),2) + pow(($y1-$y2),2)) >> (21 - $zoom); //21 is the max zoom level
}
Run Code Online (Sandbox Code Playgroud)

UPDATE

这是当前的代码:

$singleMarkers = array();
$clusterMarkers = array();

// Minimum distance between markers to be included in a cluster, at diff. zoom levels
$DISTANCE = (10000000 >> $ZOOM) / 100000;

// Loop until all markers have been compared.
while (count($markers)) {
    $marker  = array_pop($markers);
    $cluster = array();

    // Compare against all markers which are left.
    foreach ($markers as $key => $target) {
        $pixels = abs($marker['lat']-$target['lat']) + abs($marker['lng']-$target['lng']);

        // If the two markers are closer than given distance remove target marker from array and add it to cluster.
        if ($pixels < $DISTANCE) {
            unset($markers[$key]);
            $cluster[] = $target;
        }
    }

    // If a marker has been added to cluster, add also the one we were comparing to.
    if (count($cluster) > 0) {
        $cluster[] = $marker;
        $clusterMarkers[] = $cluster;
    } else {
        $singleMarkers[] = $marker;
    }
}
Run Code Online (Sandbox Code Playgroud)

Nic*_*itz 7

你真的需要计算欧氏距离吗?如果你只是比较距离的相对大小,你可以使用曼哈顿距离,这将节省你两个电话pow(),一个到sqrt():

function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
    $x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
    $y1 = $lat1*10000000;
    $x2 = $lon2*10000000;
    $y2 = $lat2*10000000;

    return ($x1-$x2) + ($y1-$y2) >> (21 - $zoom);
}
Run Code Online (Sandbox Code Playgroud)

不知道>> (21 - $zoom)你的计算是否需要这个位,所以我把它留在了.但除非你真的需要在其他地方使用计算的距离值,否则你可以直接使用纬度/经度(不需要乘以任何东西)并且考虑到曼哈顿的距离,假设你预先计算出$distance适合这个尺度,这在计算方面要比用强迫所有距离以便适应单位和幅度要便宜得多$distance.

编辑:当我去年研究这个问题时,我在维基百科上发现了一些有用的东西 - 是的,它可以发生;-)

我还强烈推荐编程集体智能:构建智能Web 2.0应用程序这本书,它深入地进行聚类,不仅适用于地理数据,还适用于其他数据分析领域.