HXT:在纯代码中读取和写入HTML时出现令人惊讶的行为

jgr*_*gre 5 haskell hxt

我想从String读取HTML,处理它并使用HXT将更改的文档作为String返回.由于此操作不需要IO,我宁愿执行箭头而runLA不是使用runX.

代码看起来像这样(省略处理以简化):

runLA (hread >>> writeDocumentToString [withOutputHTML, withIndent yes]) html
Run Code Online (Sandbox Code Playgroud)

但是,html结果中缺少周围的标记:

["\n  <head>\n    <title>Bogus</title>\n  </head>\n  <body>\n        Some trivial bogus text.\n    </body>\n",""]
Run Code Online (Sandbox Code Playgroud)

当我使用runX时,就像这样:

runX (readString [] html >>> writeDocumentToString [withOutputHTML, withIndent yes])
Run Code Online (Sandbox Code Playgroud)

我得到了预期的结果:

["<html>\n  <head>\n    <title>Bogus</title>\n  </head>\n  <body>\n        Some trivial bogus text.\n    </body>\n</html>\n"]
Run Code Online (Sandbox Code Playgroud)

为什么会这样,我该如何解决?

Tra*_*own 6

如果你看XmlTree两者的s,你会看到readString添加一个顶级"/"元素.对于非IO runLA版本:

> putStr . formatTree show . head $ runLA xread html
---XTag "html" []
   |
   +---XText "\n  "
   |
   +---XTag "head" []
   ...
Run Code Online (Sandbox Code Playgroud)

并与runX:

> putStr . formatTree show . head =<< runX (readString [] html)
---XTag "/" [NTree (XAttr "transfer-Status") [NTree (XText "200")...
   |
   +---XTag "html" []
       |
       +---XText "\n  "
       |
       +---XTag "head" []
       ...
Run Code Online (Sandbox Code Playgroud)

writeDocumentToString用于getChildren剥离此根元素.

解决这个问题的一个简单方法是使用类似的selem方法将输出包装xread在类似的根元素中,以使其看起来像输入所writeDocumentToString期望的那样:

> runLA (selem "/" [xread] >>> writeDocumentToString [withOutputHTML, withIndent yes]) html
["<html>\n  <head>\n    <title>Bogus</title>\n  </head>\n  <body>\n        Some trivial bogus text.\n    </body>\n</html>\n"]
Run Code Online (Sandbox Code Playgroud)

这产生了所需的输出.