如何解析java中的准html文本？

Question

如何解析java中的准html文本？

准html文本,看起来像: Simple text simple text simple text simple text,我想解析它并创建dom文档.但问题是关闭未封闭的标签,当我尝试这个:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource source = new InputSource(new StringReader(
Document doc = builder.parse(source);

Run Code Online (Sandbox Code Playgroud)

发生错误: org.xml.sax.SAXParseException; The element type "br" must be terminated by the matching end-tag

我不想更换所有 通过 ,任何解决方案或建议吗？

Answer 1

Mic*_*l-O 3

使用jsoup并享受易用性。

归档时间：	12 年，6 月前
查看次数：	4411 次
最近记录：	8 年，7 月前