在Haskell中创建唯一标签

Vik*_*ahl 6 compiler-construction haskell code-generation

我正在为Haskell中的简单命令式语言编写编译器,输出Java字节码.我已经到了我发出字节码的抽象表示的地步.

在编写用于编译if语句的代码时遇到了一些麻烦.要实现if语句,我需要跳转到标签.因此,我需要为该标签生成一个名称,该名称必须是唯一的.

我的第一个想法是通过一些状态compileStatement,即

compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]
Run Code Online (Sandbox Code Playgroud)

当然,compilerStatement是递归的,所以使用这种方法需要我从递归调用中将唯一ID生成器的状态传递回upp:

compileStatement :: Statement -> UniqueIDState -> (UniqueIdState, [AbstractInstruction])
Run Code Online (Sandbox Code Playgroud)

这看起来有点笨拙,特别是如果我意识到我需要在将来携带更多状态; 有更优雅的方式吗?

Don*_*art 5

你需要一个"独特的供应".在Haskell中执行此操作的常用方法是通过状态monad线程计数器,这会自动化您描述的管道问题.

  • 在haskell.org wiki上的Theres [例子](http://www.haskell.org/haskellwiki/New_monads/MonadSupply). (2认同)

atr*_*ers 5

我想,如果您拥有的唯一工具是锤子,那么将所有东西都当作钉子来对待是很诱人的。

亚伯拉罕·马斯洛。

来点不同的东西吧——一个不是Monad类成员的独特供应。碰巧的是,您几乎可以使用原始类型签名:

compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]
Run Code Online (Sandbox Code Playgroud)

如果唯一的要求是每个标签都是唯一的——不需要计算使用了多少,在相同情况下提供相同的标识符等等——那么你可以使用一种侵入性较小的技术。

从约翰·朗伯里 (John Launchbury) 和西蒙·佩顿·琼斯 (Simon Peyton Jones)在 HaskellState 中第 39-40 页:

newUniqueSupply :: IO UniqueSupply
splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
getUnique  :: UniqueSupply -> Unique

instance Eq Unique
instance Ord Unique
instance Text Unique

--    interface     --
-- ================ --
--  implementation  --

data UniqueSupply = US Unique UniqueSupply UniqueSupply

type Unique = Int

 -- where all the action happens!
newUniqueSupply :: IO UniqueSupply
newUniqueSupply
  = newVar 0             `thenST` \ uvar ->
    let
      next :: IO Unique
      next = interleaveST (
                readVar uvar         `thenST` \ u ->
                writeVar uvar (u+1)  `thenST_`
                returnStrictlyST u
              )

      supply :: IO UniqueSupply
      supply = interleaveST (
                 next         `thenST` \ u ->
                 supply       `thenST` \ s1 ->
                 supply       `thenST` \ s2 ->
                 returnST (US u s1 s2)
                )

    in
    supply

 -- bits so boring they're not even in the paper...
splitUniqueSupply (US _ s1 s2) = (s1, s2)
getUnique (US u _ _) = u
Run Code Online (Sandbox Code Playgroud)

是的……那是 1996 年的 Haskell 样本——让我们更新一下:

module UniqueSupply(
    Unique, UniqueSupply,
    newUniqueSupply, splitUniqueSupply, getUnique
) where

import Control.Monad    (liftM3)
import Data.IORef       (IORef, newIORef, atomicModifyIORef)
import System.IO.Unsafe (unsafeInterleaveIO)

newtype Unique    =  U Int deriving (Eq, Ord, Read, Show)

data UniqueSupply =  US Unique UniqueSupply UniqueSupply

newUniqueSupply   :: IO UniqueSupply
newUniqueSupply
  = do uvar <- newIORef 0
       let next   :: IO Unique
           next   =  unsafeInterleaveIO (atomicModifyIORef uvar (\u -> (u+1, U u)))

           supply :: IO UniqueSupply
           supply =  unsafeInterleaveIO (liftM3 US next supply supply)
       supply

splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
splitUniqueSupply (US _ s1 s2) =  (s1, s2)

getUnique :: UniqueSupply -> Unique
getUnique (US u _ _) =  u
Run Code Online (Sandbox Code Playgroud)

现在它又开始工作了,有一些烦恼需要处理:

  • 两种类型的使用;

  • 缺乏多态性;

  • 固定的发电方式;

  • 错误重用的可能性。

最后一点特别有趣。假设:

data Statement =
    ... | If Statement Statement Statement | ...
Run Code Online (Sandbox Code Playgroud)

那么如果:

compileStatement (If c t e) s =
    case splitUniqueSupply s of
      (s1, s2) -> case splitUniqueSupply s2 of
                    (s3, s4) -> buildCondJump (compileStatement c s1)
                                              (compileStatement t s3)
                                              (compileStatement e s4)
Run Code Online (Sandbox Code Playgroud)

被错误地改为:

compileStatement (If c t e) s =
    case splitUniqueSupply s of
      (s1, s2) -> case splitUniqueSupply s2 of
                    (s3, s4) -> buildCondJump (compileStatement c s)
                                              (compileStatement t s)
                                              (compileStatement e s)
Run Code Online (Sandbox Code Playgroud)

不仅UniqueSupplyUnique值被错误地重用,如果任何递归调用compileStatement密集地使用供应,就有可能发生空间泄漏。

我们现在考虑第二点:缺乏多态性。让我们假设存在合适的类型:

data Fresh a =  Fresh a (Fresh a) (Fresh a)

freshNew     :: ... -> IO (Fresh a)

splitFresh   :: Fresh a -> (Fresh a, Fresh a)

pluckFresh   :: Fresh a -> a
Run Code Online (Sandbox Code Playgroud)

这意味着:

instance Functor Fresh where
    fmap h (Fresh u s1 s2) =  Fresh (h u) (fmap h s1) (fmap h s2)
Run Code Online (Sandbox Code Playgroud)

然后启发:

freshNew     :: (Int -> a) -> IO (Fresh a)
freshNew g   =  fmap (fmap g) freshInts

splitFresh   :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) =  (s1, s2)

pluckFresh   :: Fresh a -> a
pluckFresh (Fresh u _ _) =  u
Run Code Online (Sandbox Code Playgroud)

然后我们可以保持freshInts私有:

freshInts    :: IO (Fresh Int)
freshInts    =  do uvar <- newIORef 0 
                   let incr u =  (u+1, u)
                       next   =  unsafeInterleaveIO $
                                 atomicModifyIORef uvar incr
                       supply =  unsafeInterleaveIO $
                                 liftM3 Fresh next supply supply
                   supply
Run Code Online (Sandbox Code Playgroud)

如果用户只需要Int值:

do            .
              .
              .
   int_supply <- freshNew id  {- id x = x -}
              .
              .
              .
Run Code Online (Sandbox Code Playgroud)

作为奖励,这也修复了使用两种类型和固定生成模式(第一点和第三点)。Fresh新模块的时间:

module Fresh(
    Fresh,
    freshNew, splitFresh, pluckFresh
) where

import Control.Monad    (liftM3)
import Data.IORef       (IORef, newIORef, atomicModifyIORef)
import System.IO.Unsafe (unsafeInterleaveIO)

data Fresh a =  Fresh a (Fresh a) (Fresh a)

instance Functor Fresh where
    fmap h (Fresh u s1 s2) =  Fresh (h u) (fmap h s1) (fmap h s2)

freshNew     :: (Int -> a) -> IO (Fresh a)
freshNew g   =  fmap (fmap g) freshInts

splitFresh   :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) =  (s1, s2)

pluckFresh   :: Fresh a -> a
pluckFresh (Fresh u _ _) =  u

-- local definition
freshInts    :: IO (Fresh Int)
freshInts    =  do uvar <- newIORef 0 
                   let incr u =  (u+1, u)
                       next   =  unsafeInterleaveIO $
                                 atomicModifyIORef uvar incr
                       supply =  unsafeInterleaveIO $
                                 liftM3 Fresh next supply supply
                   supply
Run Code Online (Sandbox Code Playgroud)

现在是重用之谜,以及对答案的初步尝试:

freshNew     :: (Int -> a) -> IO (Fresh a)
freshNew g   =  do z <- newIORef ()
                   fmap (fmap g) (freshInts z)

freshInts    :: IORef () -> IO (Fresh Int)
freshInts z  =  do let using   :: () -> (a, ())
                       using x =  (error "already used!", x)
                   () <- atomicModifyIORef z using
                   z1 <- newIORef ()
                   z2 <- newIORef ()
                   let incr u =  (u+1, u)
                       next   =  unsafeInterleaveIO $
                                 atomicModifyIORef uvar incr
                       supply =  unsafeInterleaveIO $
                                 liftM3 Fresh next (freshInts z1) (freshInts z2)
                   supply
Run Code Online (Sandbox Code Playgroud)

...是的,这是一个原型 -我们可以做得更好吗?

到目前为止,splitFresh并且pluckFresh一直是微不足道的:

splitFresh   :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) =  (s1, s2)

pluckFresh   :: Fresh a -> a
pluckFresh (Fresh u _ _) =  u
Run Code Online (Sandbox Code Playgroud)

可一些工作freshIntsfreshNew现在都被转移到他们:

  • 如果splitFresh可以直接生成这对子树,Fresh值会更简单:

     data Fresh a =  Fresh ... {- no subtrees -}
    
     splitFresh :: Fresh a -> (Fresh a, Fresh a)
     splitFresh (Fresh g ...) =  (Fresh ..., Fresh ...)
    
    Run Code Online (Sandbox Code Playgroud)
  • 如果pluckFresh可以访问生成器函数 -gfreshNew- 它可以直接提供所需的唯一值:

     data Fresh a =  Fresh (... -> a) ...
    
     pluckFresh :: Fresh a -> a
     pluckFresh (Fresh g ...) =  (g ...)
    
    Run Code Online (Sandbox Code Playgroud)

怎么样:

data Fresh a =  Fresh (Int -> a) U

splitFresh   :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh g n) =  (Fresh g n1, Fresh g n2) where 
                          (n1, n2) =  splitU n
   
pluckFresh   :: Fresh a -> a
pluckFresh (Fresh g n) =  g (intOfU n)
Run Code Online (Sandbox Code Playgroud)

在哪里:

splitU :: U -> (U, U)
intOfU :: U -> Int
Run Code Online (Sandbox Code Playgroud)

可以使freshInts简单:

freshInts    :: IO (Fresh Int)
freshInts    =  do n <- initialU
                   return (Fresh (\x -> x) n)
Run Code Online (Sandbox Code Playgroud)

假设:

initialU :: IO U
Run Code Online (Sandbox Code Playgroud)

嗯 - 关于freshInts. 然后是intOfU- 它与其他地方看到的东西有一种奇怪的相似之处......

[...] 在命令式程序中,您可能会简单地调用GenSym()每个标识符,从全局供应中分配一个唯一名称,并对供应产生副作用,以便后续调用GenSym()将提供新值。

(来自 Launchbury 和 Peyton-Jones 论文的第 39 页。)

让我们再考虑一下:

...抽象IO类型通过其在两个类型签名中的出现来指示那些外部交互的存在。

如果我们(谨慎地!)假设:

  • intOfUgenSym伪装的;

  • 这种U类型还可以作为外部互动的指标;

IE:

type U =  OI
genSym :: OI -> Int

intOfU :: U -> Int
intOfU =  ... $ genSym
Run Code Online (Sandbox Code Playgroud)

……这意味着:

data Fresh a =  Fresh (Int -> a) OI

splitU       :: OI -> (OI, OI)
   
Run Code Online (Sandbox Code Playgroud)

这看起来很有希望 - 我们现在可以重新定位genSymfreshInts

data Fresh a =  Fresh (OI -> a) OI

pluckFresh   :: Fresh a -> a
pluckFresh (Fresh g n) =  g n

freshInts    :: IO (Fresh Int)
freshInts    =  do n <- initialU
                   uvar <- newIORef 0
                   let incr n =  (n + 1, n)

                       genSym :: IO Int
                       genSym =  atomicModifyIORef uvar incr
                            
                       intOfU :: OI -> Int
                       intOfU =  ... $ genSym

                   return (Fresh intOfU n)
Run Code Online (Sandbox Code Playgroud)

这看起来更明智 - 其他一切呢?

instance Functor Fresh where
    fmap f (Fresh g n) =  Fresh (f . g) n

freshNew     :: (Int -> a) -> IO (Fresh a)
freshNew g   =  do n <- initialU
                   uvar <- newIORef 0
                   let incr n =  (n + 1, n)

                       genSym :: IO Int
                       genSym =  atomicModifyIORef uvar incr
                            
                       intOfU :: OI -> Int
                       intOfU =  ... $ genSym

                   return (Fresh (g . intOfU) n)
Run Code Online (Sandbox Code Playgroud)

这看起来很有希望——我们不再需要本地定义freshInts!我们只需要定义,并且- 在这样做时,需要考虑一些事项:U OIinitialUsplitU

  • 还记得Fresh值被错误地重用的问题compileStatement吗?好吧,这些OI值也存在同样的问题:

    pourFresh  :: Fresh a -> [a]
    pourFresh (Fresh g n) = map g (pourU n)
    
    pourU      :: OI -> [OI]
    pourU n    =  n1 : pourU n1 where (n1, n2) = splitU n
    
    Run Code Online (Sandbox Code Playgroud)

    OI类型的构造函数的现成可用性会加剧这个问题。

  • 我们仍然假设这种OI类型表示外部交互的存在 - 就像简单地称为IO...的深奥类型一样。

这表明OI类型应该是抽象的。由于我们正在处理原型并且您可能已经在使用它,也许最简单的选择就是在必要时使用Glasgol GHC 的扩展。

深呼吸的时间,并更改了一些名称:

  -- for GHC 8.6.5
{-# LANGUAGE MagicHash, UnboxedTuples #-}
import Data.Char  (isSpace)
import Data.IORef (IORef, newIORef, atomicModifyIORef)
import Prelude    (Int, String, Eq(..), Functor(..), Num(..))
import Prelude    ((.), ($), (++), error, all)
import GHC.Base   (IO(..), State#, MutVar#, RealWorld)
import GHC.Base   (seq#, newMutVar#, atomicModifyMutVar#, noDuplicate#)

data OI                 =  OI OI#

type OI#                =  String -> State# RealWorld

type IO# a              =  State# RealWorld -> (# State# RealWorld, a #)


part#                   :: OI# -> (# OI#, OI# #)
part# h                 =  case h "partOI" of
                             s -> case dispense# s of
                                    (# s', h1 #) ->
                                      case dispense# s' of
                                        (# _, h2 #) -> (# h1, h2 #)

dispense#               :: IO# OI#
dispense# s             =  case newMutVar# () s of
                             (# s', r #) -> (# s', expire# s' r #)

expire#                 :: State# s -> MutVar# s () -> String -> State# s
expire# s r name        =  case atomicModifyMutVar# r use s of
                             (# s', () #) -> s'
                           where
                               use x   =  (error nowUsed, x)
                               nowUsed =  name' ++ ": already expired"
                               name'   =  if all isSpace name then "(unknown)"
                                          else name

invokes#                :: Monomo a => String -> IO# a -> OI# -> a
(name `invokes#` act) h =  case act (noDuplicate# (h name)) of (# _, t #) -> t

class Monomo a
Run Code Online (Sandbox Code Playgroud)

你现在可以重新开始呼吸了;这里也有一些名称更改:

partFresh               :: Fresh a -> (Fresh a, Fresh a)
partFresh (Fresh g u)   =  case partOI u of
                             (u1, u2) -> (Fresh g u1, Fresh g u2)
   
pluckFresh              :: Fresh a -> a
pluckFresh (Fresh g u)  =  g u

freshNew                :: (Int -> a) -> IO (Fresh a)
freshNew g              =  do uvar <- newIORef 0
                              let incr n =  (n + 1, n)

                                  genSym :: IO Int
                                  genSym =  atomicModifyIORef uvar incr
                            
                                  gensym :: OI -> Int
                                  gensym =  "gensym" `invokes` genSym

                              runOI (Fresh (g . gensym))

instance Monomo Int
Run Code Online (Sandbox Code Playgroud)

所以你有它:一个简单的唯一供应(除了一个定义)是 monad-free:

但是,如果您确实 do需要,您可以将其Fresh用作 monadic 类型的基础,例如:

type Supply i a = Fresh i -> a

unit     :: a -> Supply i a
unit x   =  \u -> partFresh u `seq` x

bind     :: Supply i a -> (a -> Supply i b) -> Supply i b
bind m k =  \u -> case partFresh u of (u1, u2) -> (\x -> x `seq` k x u2) (m u1)
Run Code Online (Sandbox Code Playgroud)

在哪里:

 -- for GHC 8.6.5
{-# LANGUAGE CPP #-}
#define during seq
import qualified Prelude(during)

{-# NOINLINE seq #-}
infixr 0 `seq`
seq     :: a -> b -> b
seq x y = Prelude.during x (case x of _ -> y)
Run Code Online (Sandbox Code Playgroud)

或者:

 -- for GHC 8.6.5
{-# LANGUAGE CPP #-}
#define during seq
import qualified Prelude(during)
import GHC.Base(lazy)

infixr 0 `seq`
seq     :: a -> b -> b
seq x y = Prelude.during x (lazy y)
Run Code Online (Sandbox Code Playgroud)

...因为Prelude.seq 实际上不是连续的

(是的:这些定义是特定于 GHC 的;对于其他 Haskell 实现,最简单的选择很可能是添加一个新原语。至于扩展本身,它们与每个定义保持一致。)

嗯……这很有趣:

  -- for GHC 8.6.5
{-# LANGUAGE MagicHash, UnboxedTuples #-}
module Fresh(
    Fresh,
    freshNew, partFresh, pluckFresh
) where
import Data.Char  (isSpace)
import Data.IORef (IORef, newIORef, atomicModifyIORef)
import Prelude    (Int, String, Eq(..), Functor(..), Num(..))
import Prelude    ((.), ($), (++), error, all)
import GHC.Base   (IO(..), State#, MutVar#, RealWorld)
import GHC.Base   (seq#, newMutVar#, atomicModifyMutVar#, noDuplicate#)

partFresh               :: Fresh a -> (Fresh a, Fresh a)
partFresh (Fresh g u)   =  case partOI u of
                             (u1, u2) -> (Fresh g u1, Fresh g u2)
   
pluckFresh              :: Fresh a -> a
pluckFresh (Fresh g u)  =  g u

freshNew                :: (Int -> a) -> IO (Fresh a)
freshNew g              =  do uvar <- newIORef 0
                              let incr n =  (n + 1, n)

                                  genSym :: IO Int
                                  genSym =  atomicModifyIORef uvar incr
                            
                                  gensym :: OI -> Int
                                  gensym =  "gensym" `invokes` genSym

                              runOI (Fresh (g . gensym))

instance Functor Fresh where
    fmap f (Fresh g n) =  Fresh (f . g) n

-- local definitions --
data Fresh a            =  Fresh (OI -> a) OI

partOI                  :: OI -> (OI, OI)
partOI (OI h)           =  case part# h of (# h1, h2 #) -> (OI h1, OI h2)

runOI                   :: (OI -> a) -> IO a
runOI g                 =  IO $ \s -> case dispense# s of
                                        (# s', h #) -> seq# (g (OI h)) s'

invokes                 :: Monomo a => String -> IO a -> OI -> a
(name `invokes` IO act) (OI h)
                        =  (name `invokes#` act) h

class Monomo a

 -- extended definitions --
data OI                 =  OI OI#

type OI#                =  String -> State# RealWorld

type IO# a              =  State# RealWorld -> (# State# RealWorld, a #)


part#                   :: OI# -> (# OI#, OI# #)
part# h                 =  case h "partOI" of
                             s -> case dispense# s of
                                    (# s', h1 #) ->
                                      case dispense# s' of
                                        (# _, h2 #) -> (# h1, h2 #)

dispense#               :: IO# OI#
dispense# s             =  case newMutVar# () s of
                             (# s', r #) -> (# s', expire# s' r #)

expire#                 :: State# s -> MutVar# s () -> String -> State# s
expire# s r name        =  case atomicModifyMutVar# r use s of
                             (# s', () #) -> s'
                           where
                               use x   =  (error nowUsed, x)
                               nowUsed =  name' ++ ": already expired"
                               name'   =  if all isSpace name then "(unknown)"
                                          else name

invokes#                :: Monomo a => String -> IO# a -> OI# -> a
(name `invokes#` act) h =  case act (noDuplicate# (h name)) of (# _, t #) -> t

 -- supplemental instances --
instance Monomo Int
Run Code Online (Sandbox Code Playgroud)

...我们甚至设法摆脱了这个unsafe...定义——太好了!

PS:如果你对那个奇特的Monomo类感到疑惑,你可以在Standard ML的历史中找到线索......


...不错 - 在他的博客文章IO-free splittable supply 中,Luke Palmer 很好地利用了参数化/更高级别的类型来封装值供应的使用:

runSupply :: (forall a. Eq a => Supply a -> b) -> b
Run Code Online (Sandbox Code Playgroud)

……没有IO!我们可以为Fresh,做同样的事情unsafe...

  1. Fresh模块已经足够大,所以让我们OI进入它自己的模块,称为OutputInput(嗯,它与 I/O 相关 :-)

  2. 使用Haskell中的State作为指导,从可以封装OI的新类型(例如UO)中抽象出来:

    runUO :: (forall s . UO s -> a) -> a
    
    Run Code Online (Sandbox Code Playgroud)

    ...并且还被赋予了自己的模块。

  3. 这是 finnicky 部分:Fresh从切换IO<