删除html标签

Jes*_*ura 1 java arrays string

我有一位教授要求我们删除HTML标签(<和>中的任何内容),而不使用removeAll方法.

我目前有这个:

public static void main(String[] args)
        throws FileNotFoundException {
    Scanner input = new Scanner(new File("src/HTML_1.txt"));
    while (input.hasNext())
    {
        String html = input.next();
        System.out.println(stripHtmlTags(html));
    }

}

static String stripHtmlTags(String html)
{
    int i;
    String[] str = html.split("");
    String s = "";
    boolean tag = false;

    for (i = html.indexOf("<"); i < html.indexOf(">"); i++) 
    {
        tag = true;
    }

    if (!tag) 
    {
        for (i = 0; i < str.length; i++) 
        {
            s += str[i];
        }
    }
    return s;   
}
Run Code Online (Sandbox Code Playgroud)

这是文件内部的内容:

<html>
<head>
<title>My web page</title>
</head>
<body>
<p>There are many pictures of my cat here,
as well as my <b>very cool</b> blog page,
which contains <font color="red">awesome
stuff about my trip to Vegas.</p>


Here's my cat now:<img src="cat.jpg">
</body>
</html>
Run Code Online (Sandbox Code Playgroud)

这是输出应该是这样的:

My web page


There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.


Here's my cat now:
Run Code Online (Sandbox Code Playgroud)

Ell*_*sch 7

String 在Java +中是不可变的你永远不会显示任何东西

我建议你close你的Scanner时候用它做(作为最佳实践),和读取HTML_1.txt从用户的主目录文件.最简单的方法closetry-with-resources喜欢

public static void main(String[] args) {
    try (Scanner input = new Scanner(new File(
            System.getProperty("user.home"), "HTML_1.txt"))) {
        while (input.hasNextLine()) {
            String html = stripHtmlTags(input.nextLine().trim());
            if (!html.isEmpty()) { // <-- removes empty lines.
                System.out.println(html);
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
Run Code Online (Sandbox Code Playgroud)

因为String是不可变的我会建议StringBuilder删除HTML标签之类的

static String stripHtmlTags(String html) {
    StringBuilder sb = new StringBuilder(html);
    int open;
    while ((open = sb.indexOf("<")) != -1) {
        int close = sb.indexOf(">", open + 1);
        sb.delete(open, close + 1);
    }
    return sb.toString();
}
Run Code Online (Sandbox Code Playgroud) 当我跑上面的时候,我得到了
My web page
There are many pictures of my cat here,
as well as my very cool blog page,
which contains awesome
stuff about my trip to Vegas.
Here's my cat now:
Run Code Online (Sandbox Code Playgroud)