我如何提早做回块?

Joe*_*and 17 monads haskell web-scraping

我正在尝试使用Haskell来搜索网页并将结果编译成一个对象.

如果由于某种原因,我无法从页面中获取所有项目,我想停止尝试处理页面并提前返回.

例如:

scrapePage :: String -> IO ()
scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  when (isNothing title) (return ())
  date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  when (isNothing date) (return ())
  -- etc
  -- make page object and send it to db
  return ()
Run Code Online (Sandbox Code Playgroud)

问题是when不会停止执行阻止或保持其他部分不被执行.

这样做的正确方法是什么?

Phi*_* JF 18

return在haskell中,与return其他语言不同.相反,return将值注入monad(在本例中IO)是什么.你有几个选择

最简单的是使用if

scrapePage :: String -> IO ()
scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  if (isNothing title) then return () else do
   date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
   if (isNothing date) then return () else do
     -- etc
     -- make page object and send it to db
     return ()
Run Code Online (Sandbox Code Playgroud)

另一种选择是使用 unless

scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  unless (isNothing title) do
    date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
    unless (isNothing date) do
      -- etc
      -- make page object and send it to db
      return ()
Run Code Online (Sandbox Code Playgroud)

这里的一般问题是IOmonad没有控制效果(例外情况除外).另一方面,你可以使用monad变压器

scrapePage url = liftM (maybe () id) . runMaybeT $ do
  doc <- liftIO $ fromUrl url
  title <- liftIO $ liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  guard (isJust title)
  date <- liftIO $ liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  guard (isJust date)
  -- etc
  -- make page object and send it to db
  return ()
Run Code Online (Sandbox Code Playgroud)

如果你真的想要获得全面的控制效果,你需要使用 ContT

scrapePage :: String -> IO ()
scrapePage url = runContT return $ do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  when (isNothing title) $ callCC ($ ())
  date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  when (isNothing date) $ callCC ($ ())
  -- etc
  -- make page object and send it to db
  return ()
Run Code Online (Sandbox Code Playgroud)

警告:以上代码均未经过测试,甚至未经过类型检查!

  • 第二种方法对我来说效果很好。我认为你需要“除非(条件)$ do”才能编译(注意“$”) (2认同)

dav*_*420 13

使用monad变压器!

import Control.Monad.Trans.Class -- from transformers package
import Control.Error.Util        -- from errors package

scrapePage :: String -> IO ()
scrapePage url = maybeT (return ()) return $ do
  doc <- lift $ fromUrl url
  title <- liftM headMay $ lift . runX $ doc >>> css "head.title" >>> getText
  guard . not $ isNothing title
  date <- liftM headMay $ lift . runX $ doc >>> css "span.dateTime" ! "data-utc"
  guard . not $ isNothing date
  -- etc
  -- make page object and send it to db
  return ()
Run Code Online (Sandbox Code Playgroud)

为了在早期返回时更灵活地返回值,请使用throwError/ eitherT/ EitherT而不是mzero/ maybeT/ MaybeT.(虽然你不能使用guard.)

(也可能使用headZ而不是headMay明确地放弃guard.)