你会如何在Haskell中表达这一点?

Sea*_*ess 5 algorithm haskell if-statement

你会使用if/else在Haskell中编写这个算法吗?没有它们有没有办法表达它?很难从具有意义的中间提取功能.这只是机器学习系统的输出.

我正在实现将html内容片段分类为此处描述的内容或Boilerplate的算法.这具有已经硬编码的权重.

curr_linkDensity <= 0.333333
| prev_linkDensity <= 0.555556
| | curr_numWords <= 16
| | | next_numWords <= 15
| | | | prev_numWords <= 4: BOILERPLATE
| | | | prev_numWords > 4: CONTENT
| | | next_numWords > 15: CONTENT
| | curr_numWords > 16: CONTENT
| prev_linkDensity > 0.555556
| | curr_numWords <= 40
| | | next_numWords <= 17: BOILERPLATE
| | | next_numWords > 17: CONTENT
| | curr_numWords > 40: CONTENT
curr_linkDensity > 0.333333: BOILERPLATE
Run Code Online (Sandbox Code Playgroud)

luq*_*qui 11

不手动简化逻辑(假设您可能自动生成此代码),我认为使用MultiWayIf非常干净和直接.

{-# LANGUAGE MultiWayIf #-}

data Stats = Stats {
    curr_linkDensity :: Double,
    prev_linkDensity :: Double,
    ...
}

data Classification = Content | Boilerplate

classify :: Stats -> Classification
classify s = if
    | curr_linkDensity s <= 0.333333 -> if
      | prev_linkDensity s <= 0.555556 -> if
        | curr_numWords s <= 16 -> if
          | next_numWords s <= 15 -> if
            | prev_numWords s <= 4 -> Boilerplate
            | prev_numWords s > 4 -> Content
          | next_numWords s > 16 -> Content
      ...
Run Code Online (Sandbox Code Playgroud)

等等.

但是,由于它是如此结构化 - 只是if/else的树与比较,也考虑创建决策树数据结构并为其编写解释器.这将允许您进行转换,操作,检查.也许它会给你买点东西; 为您的规范定义微型语言可能会令人惊讶地受益.

data DecisionTree i o 
    = Comparison (i -> Double) Double (DecisionTree i o) (DecisionTree i o)
    | Leaf o

runDecisionTree :: DecisionTree i o -> i -> o
runDecisionTree (Comparison f v ifLess ifGreater) i
    | f i <= v  = runDecisionTree ifLess i
    | otherwise = runDecisionTree ifGreater i
runDecisionTree (Leaf o) = o

-- DecisionTree is an encoding of a function, and you can write
-- Functor, Applicative, and Monad instances!
Run Code Online (Sandbox Code Playgroud)

然后

 classifier :: DecisionTree Stats Classification
 classifier =
     Comparison curr_linkDensity 0.333333
       (Comparison prev_linkDensity 0.555556
         (Comparison curr_numWords 16
           (Comparison next_numWords 15
             (Comparison prev_numWords 4
               (Leaf Boilerplate)
               (Leaf Content))
             (Leaf Content)
           ...
Run Code Online (Sandbox Code Playgroud)

  • 轻微评论:我对卫兵的个人厌恶是为了涵盖所有情况(即详尽无遗),并且不使用"否则"作为最后一名警卫.我使用`否则'因为我发现它更好地表达了意图,我不必重复这个条件两次,并且比其他任何东西都更有效率.此外,当内部`if`具有非穷举分支时,很容易(至少对我而言)认为Haskell将回溯到下一个外部`if`分支.然而,情况并非如此,并且出现非穷举的匹配运行时错误.由于这些原因,我非常喜欢最后一种选择 (3认同)
  • 对于第一种方法,`MultiWayIf`扩展允许你编写更漂亮,用`if`替换_`的`case(). (2认同)

tha*_*guy 6

由于此决策树中只​​有三条路径导致BOILERPLATE状态,因此我只是迭代并简化它们:

isBoilerplate =
  prev_linkDensity   <= 0.555556 && curr_numWords <= 16 && prev_numWords <= 4
  || prev_linkDensity > 0.555556 && curr_numWords <= 40 && next_numWords <= 17
  || curr_linkDensity > 0.333333
Run Code Online (Sandbox Code Playgroud)

  • 我想你想要`或',而不是'任何'.你也可以使用`||`,或者使用`和`而不是`&&`s. (4认同)