Vik*_*ahl 6 compiler-construction haskell code-generation
我正在为Haskell中的简单命令式语言编写编译器,输出Java字节码.我已经到了我发出字节码的抽象表示的地步.
在编写用于编译if语句的代码时遇到了一些麻烦.要实现if语句,我需要跳转到标签.因此,我需要为该标签生成一个名称,该名称必须是唯一的.
我的第一个想法是通过一些状态compileStatement
,即
compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]
Run Code Online (Sandbox Code Playgroud)
当然,compilerStatement
是递归的,所以使用这种方法需要我从递归调用中将唯一ID生成器的状态传递回upp:
compileStatement :: Statement -> UniqueIDState -> (UniqueIdState, [AbstractInstruction])
Run Code Online (Sandbox Code Playgroud)
这看起来有点笨拙,特别是如果我意识到我需要在将来携带更多状态; 有更优雅的方式吗?
你需要一个"独特的供应".在Haskell中执行此操作的常用方法是通过状态monad线程计数器,这会自动化您描述的管道问题.
我想,如果您拥有的唯一工具是锤子,那么将所有东西都当作钉子来对待是很诱人的。
亚伯拉罕·马斯洛。
来点不同的东西吧——一个不是Monad
类成员的独特供应。碰巧的是,您几乎可以使用原始类型签名:
compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]
Run Code Online (Sandbox Code Playgroud)
如果唯一的要求是每个标签都是唯一的——不需要计算使用了多少,在相同情况下提供相同的标识符等等——那么你可以使用一种侵入性较小的技术。
从约翰·朗伯里 (John Launchbury) 和西蒙·佩顿·琼斯 (Simon Peyton Jones)在 Haskell的State 中第 39-40 页:
newUniqueSupply :: IO UniqueSupply
splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
getUnique :: UniqueSupply -> Unique
instance Eq Unique
instance Ord Unique
instance Text Unique
-- interface --
-- ================ --
-- implementation --
data UniqueSupply = US Unique UniqueSupply UniqueSupply
type Unique = Int
-- where all the action happens!
newUniqueSupply :: IO UniqueSupply
newUniqueSupply
= newVar 0 `thenST` \ uvar ->
let
next :: IO Unique
next = interleaveST (
readVar uvar `thenST` \ u ->
writeVar uvar (u+1) `thenST_`
returnStrictlyST u
)
supply :: IO UniqueSupply
supply = interleaveST (
next `thenST` \ u ->
supply `thenST` \ s1 ->
supply `thenST` \ s2 ->
returnST (US u s1 s2)
)
in
supply
-- bits so boring they're not even in the paper...
splitUniqueSupply (US _ s1 s2) = (s1, s2)
getUnique (US u _ _) = u
Run Code Online (Sandbox Code Playgroud)
是的……那是 1996 年的 Haskell 样本——让我们更新一下:
module UniqueSupply(
Unique, UniqueSupply,
newUniqueSupply, splitUniqueSupply, getUnique
) where
import Control.Monad (liftM3)
import Data.IORef (IORef, newIORef, atomicModifyIORef)
import System.IO.Unsafe (unsafeInterleaveIO)
newtype Unique = U Int deriving (Eq, Ord, Read, Show)
data UniqueSupply = US Unique UniqueSupply UniqueSupply
newUniqueSupply :: IO UniqueSupply
newUniqueSupply
= do uvar <- newIORef 0
let next :: IO Unique
next = unsafeInterleaveIO (atomicModifyIORef uvar (\u -> (u+1, U u)))
supply :: IO UniqueSupply
supply = unsafeInterleaveIO (liftM3 US next supply supply)
supply
splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
splitUniqueSupply (US _ s1 s2) = (s1, s2)
getUnique :: UniqueSupply -> Unique
getUnique (US u _ _) = u
Run Code Online (Sandbox Code Playgroud)
现在它又开始工作了,有一些烦恼需要处理:
两种类型的使用;
缺乏多态性;
固定的发电方式;
错误重用的可能性。
最后一点特别有趣。假设:
data Statement =
... | If Statement Statement Statement | ...
Run Code Online (Sandbox Code Playgroud)
那么如果:
compileStatement (If c t e) s =
case splitUniqueSupply s of
(s1, s2) -> case splitUniqueSupply s2 of
(s3, s4) -> buildCondJump (compileStatement c s1)
(compileStatement t s3)
(compileStatement e s4)
Run Code Online (Sandbox Code Playgroud)
被错误地改为:
compileStatement (If c t e) s =
case splitUniqueSupply s of
(s1, s2) -> case splitUniqueSupply s2 of
(s3, s4) -> buildCondJump (compileStatement c s)
(compileStatement t s)
(compileStatement e s)
Run Code Online (Sandbox Code Playgroud)
不仅UniqueSupply
和Unique
值被错误地重用,如果任何递归调用compileStatement
密集地使用供应,就有可能发生空间泄漏。
我们现在考虑第二点:缺乏多态性。让我们假设存在合适的类型:
data Fresh a = Fresh a (Fresh a) (Fresh a)
freshNew :: ... -> IO (Fresh a)
splitFresh :: Fresh a -> (Fresh a, Fresh a)
pluckFresh :: Fresh a -> a
Run Code Online (Sandbox Code Playgroud)
这意味着:
instance Functor Fresh where
fmap h (Fresh u s1 s2) = Fresh (h u) (fmap h s1) (fmap h s2)
Run Code Online (Sandbox Code Playgroud)
然后启发:
freshNew :: (Int -> a) -> IO (Fresh a)
freshNew g = fmap (fmap g) freshInts
splitFresh :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) = (s1, s2)
pluckFresh :: Fresh a -> a
pluckFresh (Fresh u _ _) = u
Run Code Online (Sandbox Code Playgroud)
然后我们可以保持freshInts
私有:
freshInts :: IO (Fresh Int)
freshInts = do uvar <- newIORef 0
let incr u = (u+1, u)
next = unsafeInterleaveIO $
atomicModifyIORef uvar incr
supply = unsafeInterleaveIO $
liftM3 Fresh next supply supply
supply
Run Code Online (Sandbox Code Playgroud)
如果用户只需要Int
值:
do .
.
.
int_supply <- freshNew id {- id x = x -}
.
.
.
Run Code Online (Sandbox Code Playgroud)
作为奖励,这也修复了使用两种类型和固定生成模式(第一点和第三点)。Fresh
新模块的时间:
module Fresh(
Fresh,
freshNew, splitFresh, pluckFresh
) where
import Control.Monad (liftM3)
import Data.IORef (IORef, newIORef, atomicModifyIORef)
import System.IO.Unsafe (unsafeInterleaveIO)
data Fresh a = Fresh a (Fresh a) (Fresh a)
instance Functor Fresh where
fmap h (Fresh u s1 s2) = Fresh (h u) (fmap h s1) (fmap h s2)
freshNew :: (Int -> a) -> IO (Fresh a)
freshNew g = fmap (fmap g) freshInts
splitFresh :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) = (s1, s2)
pluckFresh :: Fresh a -> a
pluckFresh (Fresh u _ _) = u
-- local definition
freshInts :: IO (Fresh Int)
freshInts = do uvar <- newIORef 0
let incr u = (u+1, u)
next = unsafeInterleaveIO $
atomicModifyIORef uvar incr
supply = unsafeInterleaveIO $
liftM3 Fresh next supply supply
supply
Run Code Online (Sandbox Code Playgroud)
现在是重用之谜,以及对答案的初步尝试:
freshNew :: (Int -> a) -> IO (Fresh a)
freshNew g = do z <- newIORef ()
fmap (fmap g) (freshInts z)
freshInts :: IORef () -> IO (Fresh Int)
freshInts z = do let using :: () -> (a, ())
using x = (error "already used!", x)
() <- atomicModifyIORef z using
z1 <- newIORef ()
z2 <- newIORef ()
let incr u = (u+1, u)
next = unsafeInterleaveIO $
atomicModifyIORef uvar incr
supply = unsafeInterleaveIO $
liftM3 Fresh next (freshInts z1) (freshInts z2)
supply
Run Code Online (Sandbox Code Playgroud)
...是的,这是一个原型 -我们可以做得更好吗?
到目前为止,splitFresh
并且pluckFresh
一直是微不足道的:
splitFresh :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) = (s1, s2)
pluckFresh :: Fresh a -> a
pluckFresh (Fresh u _ _) = u
Run Code Online (Sandbox Code Playgroud)
可一些工作freshInts
和freshNew
现在都被转移到他们:
如果splitFresh
可以直接生成这对子树,Fresh
值会更简单:
data Fresh a = Fresh ... {- no subtrees -}
splitFresh :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh g ...) = (Fresh ..., Fresh ...)
Run Code Online (Sandbox Code Playgroud)
如果pluckFresh
可以访问生成器函数 -g
在freshNew
- 它可以直接提供所需的唯一值:
data Fresh a = Fresh (... -> a) ...
pluckFresh :: Fresh a -> a
pluckFresh (Fresh g ...) = (g ...)
Run Code Online (Sandbox Code Playgroud)
怎么样:
data Fresh a = Fresh (Int -> a) U
splitFresh :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh g n) = (Fresh g n1, Fresh g n2) where
(n1, n2) = splitU n
pluckFresh :: Fresh a -> a
pluckFresh (Fresh g n) = g (intOfU n)
Run Code Online (Sandbox Code Playgroud)
在哪里:
splitU :: U -> (U, U)
intOfU :: U -> Int
Run Code Online (Sandbox Code Playgroud)
这可以使freshInts
简单:
freshInts :: IO (Fresh Int)
freshInts = do n <- initialU
return (Fresh (\x -> x) n)
Run Code Online (Sandbox Code Playgroud)
假设:
initialU :: IO U
Run Code Online (Sandbox Code Playgroud)
嗯 - 关于freshInts
. 然后是intOfU
- 它与其他地方看到的东西有一种奇怪的相似之处......
[...] 在命令式程序中,您可能会简单地调用
GenSym()
每个标识符,从全局供应中分配一个唯一名称,并对供应产生副作用,以便后续调用GenSym()
将提供新值。
(来自 Launchbury 和 Peyton-Jones 论文的第 39 页。)
让我们再考虑一下:
由于其外部(Haskell)与全局(和可变)供应的交互,genSym
将具有以下类型:
genSym :: IO Int
Run Code Online (Sandbox Code Playgroud)
以防止在纯上下文中使用它。
早期的原型有它自己的外部交互——它使用可变引用:
freshInts :: IORef () -> IO (Fresh Int)
Run Code Online (Sandbox Code Playgroud)
防止重用Fresh
值。
...抽象IO
类型通过其在两个类型签名中的出现来指示那些外部交互的存在。
如果我们(谨慎地!)假设:
intOfU
是genSym
伪装的;
这种U
类型还可以作为外部互动的指标;
IE:
type U = OI
genSym :: OI -> Int
intOfU :: U -> Int
intOfU = ... $ genSym
Run Code Online (Sandbox Code Playgroud)
……这意味着:
data Fresh a = Fresh (Int -> a) OI
splitU :: OI -> (OI, OI)
Run Code Online (Sandbox Code Playgroud)
这看起来很有希望 - 我们现在可以重新定位genSym
到freshInts
:
data Fresh a = Fresh (OI -> a) OI
pluckFresh :: Fresh a -> a
pluckFresh (Fresh g n) = g n
freshInts :: IO (Fresh Int)
freshInts = do n <- initialU
uvar <- newIORef 0
let incr n = (n + 1, n)
genSym :: IO Int
genSym = atomicModifyIORef uvar incr
intOfU :: OI -> Int
intOfU = ... $ genSym
return (Fresh intOfU n)
Run Code Online (Sandbox Code Playgroud)
这看起来更明智 - 其他一切呢?
instance Functor Fresh where
fmap f (Fresh g n) = Fresh (f . g) n
freshNew :: (Int -> a) -> IO (Fresh a)
freshNew g = do n <- initialU
uvar <- newIORef 0
let incr n = (n + 1, n)
genSym :: IO Int
genSym = atomicModifyIORef uvar incr
intOfU :: OI -> Int
intOfU = ... $ genSym
return (Fresh (g . intOfU) n)
Run Code Online (Sandbox Code Playgroud)
这看起来很有希望——我们不再需要本地定义freshInts
!我们只需要定义,并且- 在这样做时,需要考虑一些事项:U
OI
initialU
splitU
还记得Fresh
值被错误地重用的问题compileStatement
吗?好吧,这些OI
值也存在同样的问题:
pourFresh :: Fresh a -> [a]
pourFresh (Fresh g n) = map g (pourU n)
pourU :: OI -> [OI]
pourU n = n1 : pourU n1 where (n1, n2) = splitU n
Run Code Online (Sandbox Code Playgroud)
该OI
类型的构造函数的现成可用性会加剧这个问题。
我们仍然假设这种OI
类型表示外部交互的存在 - 就像简单地称为IO
...的深奥类型一样。
这表明OI
类型应该是抽象的。由于我们正在处理原型并且您可能已经在使用它,也许最简单的选择就是在必要时使用Glasgol GHC 的扩展。
深呼吸的时间,并更改了一些名称:
-- for GHC 8.6.5
{-# LANGUAGE MagicHash, UnboxedTuples #-}
import Data.Char (isSpace)
import Data.IORef (IORef, newIORef, atomicModifyIORef)
import Prelude (Int, String, Eq(..), Functor(..), Num(..))
import Prelude ((.), ($), (++), error, all)
import GHC.Base (IO(..), State#, MutVar#, RealWorld)
import GHC.Base (seq#, newMutVar#, atomicModifyMutVar#, noDuplicate#)
data OI = OI OI#
type OI# = String -> State# RealWorld
type IO# a = State# RealWorld -> (# State# RealWorld, a #)
part# :: OI# -> (# OI#, OI# #)
part# h = case h "partOI" of
s -> case dispense# s of
(# s', h1 #) ->
case dispense# s' of
(# _, h2 #) -> (# h1, h2 #)
dispense# :: IO# OI#
dispense# s = case newMutVar# () s of
(# s', r #) -> (# s', expire# s' r #)
expire# :: State# s -> MutVar# s () -> String -> State# s
expire# s r name = case atomicModifyMutVar# r use s of
(# s', () #) -> s'
where
use x = (error nowUsed, x)
nowUsed = name' ++ ": already expired"
name' = if all isSpace name then "(unknown)"
else name
invokes# :: Monomo a => String -> IO# a -> OI# -> a
(name `invokes#` act) h = case act (noDuplicate# (h name)) of (# _, t #) -> t
class Monomo a
Run Code Online (Sandbox Code Playgroud)
你现在可以重新开始呼吸了;这里也有一些名称更改:
partFresh :: Fresh a -> (Fresh a, Fresh a)
partFresh (Fresh g u) = case partOI u of
(u1, u2) -> (Fresh g u1, Fresh g u2)
pluckFresh :: Fresh a -> a
pluckFresh (Fresh g u) = g u
freshNew :: (Int -> a) -> IO (Fresh a)
freshNew g = do uvar <- newIORef 0
let incr n = (n + 1, n)
genSym :: IO Int
genSym = atomicModifyIORef uvar incr
gensym :: OI -> Int
gensym = "gensym" `invokes` genSym
runOI (Fresh (g . gensym))
instance Monomo Int
Run Code Online (Sandbox Code Playgroud)
所以你有它:一个简单的唯一供应(除了一个定义)是 monad-free:
你只需要定义例如:
nextID :: Int -> ID
Run Code Online (Sandbox Code Playgroud)
由此产生的对您的类型签名的更改是适度的:
compileStatement :: Statement -> Fresh ID -> [AbstractInstruction]
Run Code Online (Sandbox Code Playgroud)
但是,如果您确实 do
需要,您可以将其Fresh
用作 monadic 类型的基础,例如:
type Supply i a = Fresh i -> a
unit :: a -> Supply i a
unit x = \u -> partFresh u `seq` x
bind :: Supply i a -> (a -> Supply i b) -> Supply i b
bind m k = \u -> case partFresh u of (u1, u2) -> (\x -> x `seq` k x u2) (m u1)
Run Code Online (Sandbox Code Playgroud)
在哪里:
-- for GHC 8.6.5
{-# LANGUAGE CPP #-}
#define during seq
import qualified Prelude(during)
{-# NOINLINE seq #-}
infixr 0 `seq`
seq :: a -> b -> b
seq x y = Prelude.during x (case x of _ -> y)
Run Code Online (Sandbox Code Playgroud)
或者:
-- for GHC 8.6.5
{-# LANGUAGE CPP #-}
#define during seq
import qualified Prelude(during)
import GHC.Base(lazy)
infixr 0 `seq`
seq :: a -> b -> b
seq x y = Prelude.during x (lazy y)
Run Code Online (Sandbox Code Playgroud)
...因为Prelude.seq
实际上不是连续的。
(是的:这些定义是特定于 GHC 的;对于其他 Haskell 实现,最简单的选择很可能是添加一个新原语。至于扩展本身,它们与每个定义保持一致。)
嗯……这很有趣:
-- for GHC 8.6.5
{-# LANGUAGE MagicHash, UnboxedTuples #-}
module Fresh(
Fresh,
freshNew, partFresh, pluckFresh
) where
import Data.Char (isSpace)
import Data.IORef (IORef, newIORef, atomicModifyIORef)
import Prelude (Int, String, Eq(..), Functor(..), Num(..))
import Prelude ((.), ($), (++), error, all)
import GHC.Base (IO(..), State#, MutVar#, RealWorld)
import GHC.Base (seq#, newMutVar#, atomicModifyMutVar#, noDuplicate#)
partFresh :: Fresh a -> (Fresh a, Fresh a)
partFresh (Fresh g u) = case partOI u of
(u1, u2) -> (Fresh g u1, Fresh g u2)
pluckFresh :: Fresh a -> a
pluckFresh (Fresh g u) = g u
freshNew :: (Int -> a) -> IO (Fresh a)
freshNew g = do uvar <- newIORef 0
let incr n = (n + 1, n)
genSym :: IO Int
genSym = atomicModifyIORef uvar incr
gensym :: OI -> Int
gensym = "gensym" `invokes` genSym
runOI (Fresh (g . gensym))
instance Functor Fresh where
fmap f (Fresh g n) = Fresh (f . g) n
-- local definitions --
data Fresh a = Fresh (OI -> a) OI
partOI :: OI -> (OI, OI)
partOI (OI h) = case part# h of (# h1, h2 #) -> (OI h1, OI h2)
runOI :: (OI -> a) -> IO a
runOI g = IO $ \s -> case dispense# s of
(# s', h #) -> seq# (g (OI h)) s'
invokes :: Monomo a => String -> IO a -> OI -> a
(name `invokes` IO act) (OI h)
= (name `invokes#` act) h
class Monomo a
-- extended definitions --
data OI = OI OI#
type OI# = String -> State# RealWorld
type IO# a = State# RealWorld -> (# State# RealWorld, a #)
part# :: OI# -> (# OI#, OI# #)
part# h = case h "partOI" of
s -> case dispense# s of
(# s', h1 #) ->
case dispense# s' of
(# _, h2 #) -> (# h1, h2 #)
dispense# :: IO# OI#
dispense# s = case newMutVar# () s of
(# s', r #) -> (# s', expire# s' r #)
expire# :: State# s -> MutVar# s () -> String -> State# s
expire# s r name = case atomicModifyMutVar# r use s of
(# s', () #) -> s'
where
use x = (error nowUsed, x)
nowUsed = name' ++ ": already expired"
name' = if all isSpace name then "(unknown)"
else name
invokes# :: Monomo a => String -> IO# a -> OI# -> a
(name `invokes#` act) h = case act (noDuplicate# (h name)) of (# _, t #) -> t
-- supplemental instances --
instance Monomo Int
Run Code Online (Sandbox Code Playgroud)
...我们甚至设法摆脱了这个unsafe...
定义——太好了!
PS:如果你对那个奇特的Monomo
类感到疑惑,你可以在Standard ML的历史中找到线索......
...不错 - 在他的博客文章IO-free splittable supply 中,Luke Palmer 很好地利用了参数化/更高级别的类型来封装值供应的使用:
runSupply :: (forall a. Eq a => Supply a -> b) -> b
Run Code Online (Sandbox Code Playgroud)
……没有IO
!我们可以为Fresh
,做同样的事情unsafe...
:
该Fresh
模块已经足够大,所以让我们OI
进入它自己的模块,称为OutputInput(嗯,它与 I/O 相关 :-)
使用Haskell中的State作为指导,从可以封装OI
的新类型(例如UO
)中抽象出来:
runUO :: (forall s . UO s -> a) -> a
Run Code Online (Sandbox Code Playgroud)
...并且还被赋予了自己的模块。
这是 finnicky 部分:Fresh
从切换IO<