Sea*_*ess 5 algorithm haskell if-statement
你会使用if/else在Haskell中编写这个算法吗?没有它们有没有办法表达它?很难从具有意义的中间提取功能.这只是机器学习系统的输出.
我正在实现将html内容片段分类为此处描述的内容或Boilerplate的算法.这具有已经硬编码的权重.
curr_linkDensity <= 0.333333
| prev_linkDensity <= 0.555556
| | curr_numWords <= 16
| | | next_numWords <= 15
| | | | prev_numWords <= 4: BOILERPLATE
| | | | prev_numWords > 4: CONTENT
| | | next_numWords > 15: CONTENT
| | curr_numWords > 16: CONTENT
| prev_linkDensity > 0.555556
| | curr_numWords <= 40
| | | next_numWords <= 17: BOILERPLATE
| | | next_numWords > 17: CONTENT
| | curr_numWords > 40: CONTENT
curr_linkDensity > 0.333333: BOILERPLATE
Run Code Online (Sandbox Code Playgroud)
luq*_*qui 11
不手动简化逻辑(假设您可能自动生成此代码),我认为使用MultiWayIf
非常干净和直接.
{-# LANGUAGE MultiWayIf #-}
data Stats = Stats {
curr_linkDensity :: Double,
prev_linkDensity :: Double,
...
}
data Classification = Content | Boilerplate
classify :: Stats -> Classification
classify s = if
| curr_linkDensity s <= 0.333333 -> if
| prev_linkDensity s <= 0.555556 -> if
| curr_numWords s <= 16 -> if
| next_numWords s <= 15 -> if
| prev_numWords s <= 4 -> Boilerplate
| prev_numWords s > 4 -> Content
| next_numWords s > 16 -> Content
...
Run Code Online (Sandbox Code Playgroud)
等等.
但是,由于它是如此结构化 - 只是if/else的树与比较,也考虑创建决策树数据结构并为其编写解释器.这将允许您进行转换,操作,检查.也许它会给你买点东西; 为您的规范定义微型语言可能会令人惊讶地受益.
data DecisionTree i o
= Comparison (i -> Double) Double (DecisionTree i o) (DecisionTree i o)
| Leaf o
runDecisionTree :: DecisionTree i o -> i -> o
runDecisionTree (Comparison f v ifLess ifGreater) i
| f i <= v = runDecisionTree ifLess i
| otherwise = runDecisionTree ifGreater i
runDecisionTree (Leaf o) = o
-- DecisionTree is an encoding of a function, and you can write
-- Functor, Applicative, and Monad instances!
Run Code Online (Sandbox Code Playgroud)
然后
classifier :: DecisionTree Stats Classification
classifier =
Comparison curr_linkDensity 0.333333
(Comparison prev_linkDensity 0.555556
(Comparison curr_numWords 16
(Comparison next_numWords 15
(Comparison prev_numWords 4
(Leaf Boilerplate)
(Leaf Content))
(Leaf Content)
...
Run Code Online (Sandbox Code Playgroud)
由于此决策树中只有三条路径导致BOILERPLATE状态,因此我只是迭代并简化它们:
isBoilerplate =
prev_linkDensity <= 0.555556 && curr_numWords <= 16 && prev_numWords <= 4
|| prev_linkDensity > 0.555556 && curr_numWords <= 40 && next_numWords <= 17
|| curr_linkDensity > 0.333333
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
169 次 |
最近记录: |