无限懒惰的位图

ael*_*ndy 12 haskell data-structures

我正在尝试构建一个包含无限位图的惰性数据结构.我想支持以下操作:

  1. true :: InfBitMap

    返回True的无限位图,即所有位置的值应为True.

  2. falsify :: InfBitMap -> [Int] -> InfBitMap

    将列表中的所有位置设置为False.该列表可能是无限的.例如,伪造真[0,2 ..]将返回一个列表,其中所有(且仅)奇数位置为True.

  3. check :: InfBitMap -> Int -> Bool

    检查索引的值.

这是我到目前为止所能做的.

-- InfBitMap will look like [(@), (@, @), (@, @, @, @)..]
type InfBitMap = [Seq Bool]

true :: InfBitMap
true = iterate (\x -> x >< x) $ singleton True

-- O(L * log N) where N is the biggest index in the list checked for later
-- and L is the length of the index list. It is assumed that the list is
-- sorted and unique.
falsify :: InfBitMap -> [Int] -> InfBitMap
falsify ls is = map (falsify' is) ls
     where
         -- Update each sequence with all indices within its length
         -- Basically composes a list of (update pos False) for all positions
         -- within the length of the sequence and then applies it.
         falsify' is l = foldl' (.) id
                                (map ((flip update) False)
                                     (takeWhile (< length l) is))
                         $ l
-- O(log N) where N is the index.
check :: InfBitMap -> Int -> Bool
check ls i = index (fromJust $ find ((> i) . length) ls) i
Run Code Online (Sandbox Code Playgroud)

我想知道是否有一些我缺少的Haskellish概念/数据结构会使我的代码更优雅/更有效(常量对我来说无关紧要,只是顺序).我试着看拉链和镜片,但它们似乎没有帮助.我想保持更新和检查的复杂性对数(可能只是摊销对数).

注意:在有人怀疑它之前,这不是一个功课问题!

更新:

我突然想到检查可以改进到:

-- O(log N) where N is the index.
-- Returns "collapsed" bitmap for later more efficient checks.
check :: InfBitMap -> Int -> (Bool, InfBitMap)
check ls i = (index l i, ls')
    where
        ls'@(l:_) = dropWhile ((<= i) . length) ls
Run Code Online (Sandbox Code Playgroud)

可以将其转换为Monad以实现代码清洁.

And*_*ács 8

众所周知的整数特里略有变化似乎适用于此.

{-# LANGUAGE DeriveFunctor #-}

data Trie a = Trie a (Trie a) (Trie a) deriving (Functor)

true :: Trie Bool
true = Trie True true true

-- O(log(index))
check :: Trie a -> Int -> a
check t i | i < 0 = error "negative index"
check t i = go t (i + 1) where
    go (Trie a _ _) 1 = a
    go (Trie _ l r) i = go (if even i then l else r) (div i 2)

--O(log(index))
modify :: Trie a -> Int -> (a -> a) -> Trie a
modify t i f | i < 0 = error "negative index"
modify t i f = go t (i + 1) where
    go (Trie a l r) 1 = Trie (f a) l r
    go (Trie a l r) i | even i = Trie a (go l (div i 2)) r
    go (Trie a l r) i = Trie a l (go r (div i 2))
Run Code Online (Sandbox Code Playgroud)

遗憾的是,我们无法使用modify实现,falsify因为我们无法处理无限的索引列表(所有修改都必须在可以检查trie的元素之前执行).相反,我们应该做更像合并的事情:

ascIndexModify :: Trie a -> [(Int, a -> a)] -> Trie a
ascIndexModify t is = go 1 t is where
    go _ t [] = t
    go i t@(Trie a l r) ((i', f):is) = case compare i (i' + 1) of
        LT -> Trie a (go (2*i) l ((i', f):is)) (go (2*i+1) r ((i', f):is))
        GT -> go i t is
        EQ -> Trie (f a) (go (2*i) l is) (go (2*i+1) r is)

falsify :: Trie Bool -> [Int] -> Trie Bool
falsify t is = ascIndexModify t [(i, const False) | i <- is]
Run Code Online (Sandbox Code Playgroud)

我们假设严格上升指数is,否则我们会跳过trie中的位置甚至得到非终止,例如在check (falsify t (repeat 0)) 1.

时间的复杂性因懒惰而有点复杂.在check (falsify t is) index,我们支付额外log 2 index数量的比较额外成本,以及进一步length (filter (<index) is)的比较(即踩过所有指数的成本小于我们正在查找的成本).你可以说是的O(max(log(index), length(filter (<index) is)).无论如何,它肯定比O(length is * log (index))我们falsify为有限的is-es使用实现的更好modify.

我们必须记住,树节点被评估一次,并且check在第一个之后的相同索引的后续-s check不支付任何额外成本falsify.再一次,懒惰使这有点复杂.

falsify当我们想要遍历trie的前缀时,这也非常好.采取这个toList功能:

trieToList :: Trie a -> [a]
trieToList t = go [t] where
    go ts = [a | Trie a _ _ <- ts] 
            ++ go (do {Trie _ l r <- ts; [l, r]})
Run Code Online (Sandbox Code Playgroud)

它是线性时间内的标准广度优先遍历.当我们计算时take n $ trieToList (falsify t is),遍历时间保持线性,因为falsify最多会产生n + length (filter (<n) is)额外的比较,这最多是2 * n假设严格增加is.

(边注:广度优先遍历的空间要求是相当痛苦的,但我不能看到一个简单的方法来帮助它,因为反复深入还差这里,因为整个树必须在内存中举行,而BFS只需要记住树的底层).