fra*_*a66 6 algorithm haskell functional-programming hugs closest-points
给定二维空间中的点列表,您希望在Haskell中执行一个函数来找到两个最近点之间的距离.示例:输入:项目[(1,5),(3,4),(2,8),( - 1,2),( - 8.6),(7.0),(1.5),(5.5),(4.8 ),(7.4)]输出:2.0
假设列表中两个最远点之间的距离最多为10000.
这是我的代码:
import Data.List
import System.Random
sort_ :: Ord a => [a] -> [a]
sort_ [] = []
sort_ [x] = [x]
sort_ xs = merge (sort_ left) (sort_ right)
where
(left, right) = splitAt (length xs `div` 2) xs
merge [] xs = xs
merge xs [] = xs
merge (x:xs) (y:ys)=
if x <= y then
x : merge xs (y:ys)
else y : merge (x:xs) ys
project :: [(Float,Float)] -> Float
project [] = 0
project (x:xs)=
if null (xs) then
error "The list have only 1 point"
else head(sort_(dstList(x:xs)))
distance :: (Float,Float)->(Float,Float) -> Float
distance (x1,y1) (x2,y2) = sqrt((x1 - x2)^2 + (y1 - y2)^2)
dstList :: [(Float,Float)] -> [Float]
dstList (x:xs)=
if length xs == 1 then
(dstBetween x xs):[]
else (dstBetween x xs):(dstList xs)
dstBetween :: (Float,Float) -> [(Float,Float)] -> Float
dstBetween pnt (x:xs)=
if null (xs) then
distance pnt x
else minimum ((distance pnt ):((dstBetween pnt xs)):[])
{-
Calling generator to create a file created at random points
-}
generator = do
putStrLn "Enter File Name"
file <- getLine
g <- newStdGen
let pts = take 1000 . unfoldr (Just . (\([a,b],c)->((a,b),c)) . splitAt 2)
$ randomRs(-1,1) g :: [(Float,Float)]
writeFile file . show $ pts
{-
Call the main to read a file and pass it to the function of project
The function of the project should keep the name 'project' as described
in the statement
-}
main= do
putStrLn "Enter filename to read"
name <- getLine
file <- readFile name
putStrLn . show . project $ readA file
readA::String->[(Float,Float)]
readA = read
Run Code Online (Sandbox Code Playgroud)
我可以像示例中那样执行程序运行,也可以使用生成器执行如下运行:
在haskell解释器中必须输入"generator",程序将在这里要求包含一千个点的文件名.在Haskell生成文件后,解释器必须编写main,并请求一个文件名,这是用"generator"创建的文件的名称.
问题是,对于随机生成的1000个点,我的程序需要很长时间,在具有双核处理器的计算机上大约需要3分钟.我究竟做错了什么?如何优化我的代码以更快地工作?
Wil*_*ess 13
您正在使用二次算法:
project [] = error "Empty list of points"
project [_] = error "Single point is given"
project ps = go 10000 ps
where
go a [_] = a
go a (p:ps) = let a2 = min a $ minimum [distance p q | q<-ps]
in a2 `seq` go a2 ps
Run Code Online (Sandbox Code Playgroud)
你应该使用更好的算法.在SO上搜索计算几何标记以获得更好的算法.
另见http://en.wikipedia.org/wiki/Closest_pair_of_points_problem.
@maxtaldykin提出了一个很好,简单而有效的算法更改,它应该对随机数据产生真正的影响 - 通过X坐标对点进行预排序,并且永远不要尝试比X坐标更d远离当前点的单位的点(d目前已知的最小距离):
import Data.Ord (comparing)
import Data.List (sortBy)
project2 ps@(_:_:_) = go 10000 p1 t
where
(p1:t) = sortBy (comparing fst) ps
go d _ [] = d
go d p1@(x1,_) t = g2 d t
where
g2 d [] = go d (head t) (tail t)
g2 d (p2@(x2,_):r)
| x2-x1 >= d = go d (head t) (tail t)
| d2 >= d = g2 d r
| otherwise = g2 d2 r -- change it "mid-flight"
where
d2 = distance p1 p2
Run Code Online (Sandbox Code Playgroud)
在随机数据,g2将在工作O(1)时间,以便go将O(n)整个事情将被排序为界,~ n log n.
~ n^2.1第一个代码(在1k/2k范围内)的经验增长顺序显示,在第二个代码上~n^1.1,在10k/20k范围内,快速测试它编译加载到GHCi中(第二个代码运行速度比第一个快50倍) 2,000点,3,000点快80倍).
可以稍微修改您的强力搜索,以便在随机数据上获得更好的性能.
主要思想是按x坐标对点进行排序,并且在比较循环中的距离时,仅考虑水平距离不大于当前最小距离的点.
这可能快一个数量级,但在最坏的情况下,它仍然是O(n ^ 2).
实际上,在2000点上它在我的机器上快了50倍.
project points = loop1 10000 byX
where
-- sort points by x coordinate
-- (you need import Data.Ord to use `comparing`)
byX = sortBy (comparing fst) points
-- loop through all points from left to right
-- threading `d` through iterations as a minimum distance so far
loop1 d = foldl' loop2 d . tails
-- `tail` drops leftmost points one by one so `x` is moving from left to right
-- and `xs` contains all points to the right of `x`
loop2 d [] = d
loop2 d (x:xs) = let
-- we take only those points of `xs` whose horizontal distance
-- is not greater than current minimum distance
xs' = takeWhile ((<=d) . distanceX x) xs
distanceX (a,_) (b,_) = b - a
-- then just get minimum distance from `x` to those `xs'`
in minimum $ d : map (distance x) xs'
Run Code Online (Sandbox Code Playgroud)
顺便说一句,请不要使用这么多括号.Haskell不需要包含函数参数.