在 Haskell 中对列表进行三角化

Pet*_*gey 12 algorithm haskell functional-programming list

我有兴趣编写一个高效的 Haskell 函数triangularize :: [a] -> [[a]],它接受一个(可能是无限的)列表并将其“三角化”为一个列表列表。例如,triangularize [1..19]应该返回

[[1,  3,  6,  10, 15]
,[2,  5,  9,  14]
,[4,  8,  13, 19]
,[7,  12, 18]
,[11, 17]
,[16]]
Run Code Online (Sandbox Code Playgroud)

高效,我的意思是我希望它及时运行,O(n)其中n列表的长度是多少。


请注意,这在 Python 这样的语言中很容易做到,因为附加到列表(数组)的末尾是一个常数时间操作。完成此操作的一个非常命令式的 Python 函数是:

def triangularize(elements):
    row_index = 0
    column_index = 0
    diagonal_array = []
    for a in elements:
        if row_index == len(diagonal_array):
            diagonal_array.append([a])
        else:
            diagonal_array[row_index].append(a)
        if row_index == 0:
            (row_index, column_index) = (column_index + 1, 0)
        else:
            row_index -= 1
            column_index += 1
    return diagonal_array
Run Code Online (Sandbox Code Playgroud)

这是因为我一直在使用 Haskell 在整数序列在线百科全书(OEIS) 中编写一些“表格”序列,并且我希望能够将普通(一维)序列转换为(2-维)序列的序列正是以这种方式。

也许有一些聪明的(或不那么聪明的)方法来foldr覆盖输入列表,但我无法解决它。

Dan*_*ner 13

制作越来越大的块:

chunks :: [a] -> [[a]]
chunks = go 0 where
    go n [] = []
    go n as = b : go (n+1) e where (b,e) = splitAt n as
Run Code Online (Sandbox Code Playgroud)

然后只需转置两次:

diagonalize :: [a] -> [[a]]
diagonalize = transpose . transpose . chunks
Run Code Online (Sandbox Code Playgroud)

在 ghci 中试试:

> diagonalize [1..19]
[[1,3,6,10,15],[2,5,9,14],[4,8,13,19],[7,12,18],[11,17],[16]]
Run Code Online (Sandbox Code Playgroud)

  • 这个解决方案非常棒。我使用整数的惰性字典树构建了一个解决方案,但与此相比,它在性能方面显得苍白无力。经验测量表明这也非常接近线性时间。我不明白怎么... (4认同)
  • 嗯。好吧,我突然意识到我并不是非常有信心“转置”是 O(n)。我也不是非常有信心它不是——它的实现有点复杂! (2认同)

jpm*_*ier 6

This appears to be directly related to the set theory argument proving that the set of integer pairs are in one-to-one correspondence with the set of integers (denumerable). The argument involves a so-called Cantor pairing function.

So, out of curiosity, let's see if we can get a diagonalize function that way. Define the infinite list of Cantor pairs recursively in Haskell:

auxCantorPairList :: (Integer, Integer) -> [(Integer, Integer)]
auxCantorPairList (x,y) =
    let nextPair = if (x > 0) then (x-1,y+1) else (x+y+1, 0)
    in (x,y) : auxCantorPairList nextPair

cantorPairList :: [(Integer, Integer)]
cantorPairList = auxCantorPairList (0,0)
Run Code Online (Sandbox Code Playgroud)

And try that inside ghci:

 ?> take 15 cantorPairList
[(0,0),(1,0),(0,1),(2,0),(1,1),(0,2),(3,0),(2,1),(1,2),(0,3),(4,0),(3,1),(2,2),(1,3),(0,4)]
 ?> 
Run Code Online (Sandbox Code Playgroud)

We can number the pairs, and for example extract the numbers for those pairs which have a zero x coordinate:

 ?> 
 ?> xs = [1..]
 ?> take 5 $ map fst $ filter (\(n,(x,y)) -> (x==0)) $ zip xs cantorPairList
[1,3,6,10,15]
 ?> 
Run Code Online (Sandbox Code Playgroud)

We recognize this is the top row from the OP's result in the text of the question. Similarly for the next two rows:

 ?> 
 ?> makeRow xs row = map fst $ filter (\(n,(x,y)) -> (x==row)) $ zip xs cantorPairList
 ?> take 5 $ makeRow xs 1
[2,5,9,14,20]
 ?> 
 ?> take 5 $ makeRow xs 2
[4,8,13,19,26]
 ?> 
Run Code Online (Sandbox Code Playgroud)

From there, we can write our first draft of a diagonalize function:

 ?> 
 ?> printAsLines xs = mapM_ (putStrLn . show) xs
 ?> diagonalize xs = takeWhile (not . null) $ map (makeRow xs) [0..]
 ?> 
 ?> printAsLines $ diagonalize [1..19]
[1,3,6,10,15]
[2,5,9,14]
[4,8,13,19]
[7,12,18]
[11,17]
[16]
 ?> 

Run Code Online (Sandbox Code Playgroud)

EDIT: performance update

For a list of 1 million items, the runtime is 18 sec, and 145 seconds for 4 millions items. As mentioned by Redu, this seems like O(n?n) complexity.

Distributing the pairs among the various target sublists is inefficient, as most filter operations fail.

To improve performance, we can use a Data.Map structure for the target sublists.


{-#  LANGUAGE  ExplicitForAll       #-}
{-#  LANGUAGE  ScopedTypeVariables  #-}

import qualified  Data.List  as  L
import qualified  Data.Map   as  M

type MIL a = M.Map Integer [a]

buildCantorMap :: forall a.  [a] -> MIL a
buildCantorMap xs = 
    let   ts     =  zip xs cantorPairList -- triplets (a,(x,y))
          m0     = (M.fromList [])::MIL a
          redOp m (n,(x,y)) = let  afn as = case as of
                                              Nothing  -> Just [n]
                                              Just jas -> Just (n:jas)
                              in   M.alter afn x m
          m1r = L.foldl' redOp m0 ts
    in
          fmap reverse m1r

diagonalize :: [a] -> [[a]]
diagonalize xs = let  cm = buildCantorMap xs
                 in   map snd $ M.toAscList cm


Run Code Online (Sandbox Code Playgroud)

With that second version, performance appears to be much better: 568 msec for the 1 million items list, 2669 msec for the 4 millions item list. So it is close to the O(n*Log(n)) complexity we could have hoped for.