大家好,我有一个java字符串,我想删除除新行标签之外的所有html标签<br>,</br>并将文本保留在标签内(如果有文本)。2-解析文本结果后相互连接,如: text1andtext2 ,文本之间没有空格分隔,我也想这样做。
这就是我正在做的:
String html = "<div dir=\"ltr\">hello my friend<span>ECHO</span><br>how are you ?<br><br><div class=\"gmail_quote\">On Mon, Feb 14, 2011 at 10:45 AM, My Friend <span dir=\"ltr\"><<a href=\"mailto:notifications@mydomain.com\">notifications@mydomain.com</a>></span> wrote:<br> "
+ "<blockquote class=\"gmail_quote\" style=\"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;\"> ";
String parsedText = html.replaceAll("\\<.*?\\>", "");
System.out.println(parsedText);
Run Code Online (Sandbox Code Playgroud)
电流输出:
hello my friendECHOhow are you ?On Mon, Feb 14, 2011 at 10:45 AM, My Friend <notifications@mydomain.com> wrote:
Run Code Online (Sandbox Code Playgroud)
期望的输出:
hello my friend ECHO <br> how are you ? <br> <br> On Mon, Feb 14, 2011 at 10:45 AM, My Friend &`lt;notifications@mydomain.com> wrote:`
Run Code Online (Sandbox Code Playgroud)
你可以这样做:
final String html =
"<div dir=\"ltr\">hello my friend<span>ECHO</span><br>how are you ?" +
"<br><br><div class=\"gmail_quote\">On Mon, Feb 14, 2011 at 10:45 AM," +
" My Friend <span dir=\"ltr\"><<a href=\"mailto:notifications@mydo" +
"main.com\">notifications@mydomain.com</a>></span> wrote:<br><bloc" +
"kquote class=\"gmail_quote\" style=\"margin: 0pt 0pt 0pt 0.8ex; bord" +
"er-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;\"> ";
final Pattern tagPattern = Pattern.compile("<([^\\s>/]+).*?>");
final Matcher matcher = tagPattern.matcher(html);
final StringBuffer sb = new StringBuffer(html.length());
while(matcher.find()){
matcher
.appendReplacement(sb, matcher.group(1).equalsIgnoreCase("br")
? matcher.group()
: " ");
}
matcher.appendTail(sb);
final String parsedText = sb.toString();
System.out.println(parsedText);
Run Code Online (Sandbox Code Playgroud)
输出:
hello my friendECHO<br>how are you ?<br><br>On Mon, Feb 14, 2011 at 10:45 AM,
My Friend <notifications@mydomain.com> wrote:<br>
Run Code Online (Sandbox Code Playgroud)
但我希望你知道克苏鲁在召唤你。不要使用正则表达式解析 HTML / XML!
| 归档时间: |
|
| 查看次数: |
6343 次 |
| 最近记录: |