如何使用Haskell对列表中的类似项进行分组?

td1*_*123 22 haskell

给出一个像这样的元组列表:

dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
Run Code Online (Sandbox Code Playgroud)

如何对列表grpdic项进行分组,其中,

grp  = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]
Run Code Online (Sandbox Code Playgroud)

我实际上是Haskell的新手......并且似乎爱上了它.在Data.List中
使用groupgroupBy只会对列表中相似的相邻项进行分组.我为此编写了一个低效的函数,但由于我需要处理一个非常大的编码字符串列表,因此会导致内存故障.希望你能帮我找到更有效的方法.

Dan*_*ner 57

尽可能重用库代码.

import Data.Map
sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]
Run Code Online (Sandbox Code Playgroud)

在ghci中尝试一下:

*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]
Run Code Online (Sandbox Code Playgroud)


Mik*_*kov 15

这是我的解决方案:

import Data.Function (on)
import Data.List (sortBy, groupBy)
import Data.Ord (comparing)

myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])]
myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)
          . sortBy (comparing fst)
Run Code Online (Sandbox Code Playgroud)

这首先通过以下方式对列表进行排序sortBy:

[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]     
=> [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
Run Code Online (Sandbox Code Playgroud)

然后按相关键对列表元素进行分组groupBy:

[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")] 
=> [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
Run Code Online (Sandbox Code Playgroud)

然后将分组的项目转换为元组map:

[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]] 
=> [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)
Run Code Online (Sandbox Code Playgroud)

测试:

> myGroup dic
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]
Run Code Online (Sandbox Code Playgroud)


Fed*_*lev 6

您还可以使用TransformListComp扩展,例如:

Prelude> :set -XTransformListComp 
Prelude> import GHC.Exts (groupWith, the)
Prelude GHC.Exts> let dic = [ (1, "aa"), (1, "bb"), (1, "cc") , (2, "aa"), (3, "ff"), (3, "gg")]
Prelude GHC.Exts> [(the key, value) | (key, value) <- dic, then group by key using groupWith]
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]
Run Code Online (Sandbox Code Playgroud)