Pet*_*gey 12 algorithm haskell functional-programming list
我有兴趣编写一个高效的 Haskell 函数triangularize :: [a] -> [[a]],它接受一个(可能是无限的)列表并将其“三角化”为一个列表列表。例如,triangularize [1..19]应该返回
[[1, 3, 6, 10, 15]
,[2, 5, 9, 14]
,[4, 8, 13, 19]
,[7, 12, 18]
,[11, 17]
,[16]]
Run Code Online (Sandbox Code Playgroud)
高效,我的意思是我希望它及时运行,O(n)其中n列表的长度是多少。
请注意,这在 Python 这样的语言中很容易做到,因为附加到列表(数组)的末尾是一个常数时间操作。完成此操作的一个非常命令式的 Python 函数是:
def triangularize(elements):
row_index = 0
column_index = 0
diagonal_array = []
for a in elements:
if row_index == len(diagonal_array):
diagonal_array.append([a])
else:
diagonal_array[row_index].append(a)
if row_index == 0:
(row_index, column_index) = (column_index + 1, 0)
else:
row_index -= 1
column_index += 1
return diagonal_array
Run Code Online (Sandbox Code Playgroud)
这是因为我一直在使用 Haskell 在整数序列在线百科全书(OEIS) 中编写一些“表格”序列,并且我希望能够将普通(一维)序列转换为(2-维)序列的序列正是以这种方式。
也许有一些聪明的(或不那么聪明的)方法来foldr覆盖输入列表,但我无法解决它。
Dan*_*ner 13
制作越来越大的块:
chunks :: [a] -> [[a]]
chunks = go 0 where
go n [] = []
go n as = b : go (n+1) e where (b,e) = splitAt n as
Run Code Online (Sandbox Code Playgroud)
然后只需转置两次:
diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks
Run Code Online (Sandbox Code Playgroud)
在 ghci 中试试:
> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]
Run Code Online (Sandbox Code Playgroud)
This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.
So, out of curiosity, let's see if we can get a diagonalize function that way.
Define the infinite list of Cantor pairs recursively in Haskell:
auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
in (x,y) : auxCantorPairList nextPair
cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)
Run Code Online (Sandbox Code Playgroud)
And try that inside ghci:
?> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
?>
Run Code Online (Sandbox Code Playgroud)
We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:
?>
?> xs = [1..]
?> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
?>
Run Code Online (Sandbox Code Playgroud)
We recognize this is the top row from the OP's result in the text of the question. Similarly for the next two rows:
?>
?> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
?> take 5 $ makeRow xs 1
[2,5,9,14,20]
?>
?> take 5 $ makeRow xs 2
[4,8,13,19,26]
?>
Run Code Online (Sandbox Code Playgroud)
From there, we can write our first draft of a diagonalize function:
?>
?> printAsLines xs = mapM_ (putStrLn . show) xs
?> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
?>
?> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
?>
Run Code Online (Sandbox Code Playgroud)
For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n?n) complexity.
Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.
To improve performance, we can use a Data.Map structure for the target sublists.
{-# LANGUAGE ExplicitForAll #-}
{-# LANGUAGE ScopedTypeVariables #-}
import qualified Data.List as L
import qualified Data.Map as M
type MIL a = M.Map Integer [a]
buildCantorMap :: forall a. [a] -> MIL a
buildCantorMap xs =
let ts = zip xs cantorPairList -- triplets (a,(x,y))
m0 = (M.fromList [])::MIL a
redOp m (n,(x,y)) = let afn as = case as of
Nothing -> Just [n]
Just jas -> Just (n:jas)
in M.alter afn x m
m1r = L.foldl' redOp m0 ts
in
fmap reverse m1r
diagonalize :: [a] -> [[a]]
diagonalize xs = let cm = buildCantorMap xs
in map snd $ M.toAscList cm
Run Code Online (Sandbox Code Playgroud)
With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.
| 归档时间: |
|
| 查看次数: |
191 次 |
| 最近记录: |