tuc*_*r19 4 search haskell functional-programming list
我有一个 100K+ 元素的列表,这是一个有限列表。目前我正在使用 Data.List 函数 elem。在查看 Data.List 信息页面时,还有查找和过滤器。其中之一会比 elem 函数更快吗?
以防万一我们还没有完全击败死马......
不同的集合表示存在巨大的性能差异。作为一个示例(可能与您的用例匹配也可能不匹配),请考虑采用 200K 个随机元素的列表和计算来确定 200 个随机元素的成员资格。
我已经实现了三种明显的方法来做到这一点 - 使用 elem列表,转换为 aHashSet并检查成员资格,以及执行布隆过滤器和哈希集的混合。基准测试显示列表解决方案比哈希集慢 3 个数量级,比混合慢约 2 倍。
benchmarking list
mean: 460.7106 ms, lb 459.2952 ms, ub 462.8491 ms, ci 0.950
std dev: 8.741096 ms, lb 6.293703 ms, ub 12.23082 ms, ci 0.950
benchmarking hashset
mean: 175.2730 us, lb 173.9140 us, ub 177.0802 us, ci 0.950
std dev: 7.966790 us, lb 6.391454 us, ub 10.25774 us, ci 0.950
benchmarking bloom+hashset
mean: 88.22402 us, lb 87.35856 us, ub 89.66884 us, ci 0.950
std dev: 5.642663 us, lb 3.793715 us, ub 8.264222 us, ci 0.950
和代码:
import qualified Data.HashSet as Set
import           Data.HashSet (Set)
import qualified Data.BloomFilter as BF
import qualified Data.BloomFilter.Easy as BF
import           Data.BloomFilter (Bloom)
import           Data.BloomFilter.Hash as H2
import           Data.Hashable as H1
import Criterion.Main
import System.Random
data MySet a = MS (Set a) (Bloom a)
fromList :: (H2.Hashable a, H1.Hashable a, Ord a) => [a] -> MySet a
fromList as =
    let hs = Set.fromList as
        bf = BF.easyList 0.2 as
    in hs `seq` bf `seq` MS hs bf
member :: (H2.Hashable a, H1.Hashable a, Ord a) => a -> MySet a -> Bool
member e (MS hs bf)
    | BF.elemB e bf = Set.member e hs
    | otherwise      = False
main = do
  list   <- take 200000 `fmap` randomsIO :: IO [Int]
  xs     <- take 200    `fmap` randomsIO
  let hs  = Set.fromList list
      bhs = fromList list
  defaultMain
        [ bench "list" $ nf (map (`elem` list)) xs
        , bench "hashset" $ nf (map (`Set.member` hs)) xs
        , bench "bloom+hashset" $ nf (map (`member` bhs)) xs
        ]
randomsIO = randoms `fmap` newStdGen
| 归档时间: | 
 | 
| 查看次数: | 991 次 | 
| 最近记录: |