小编use*_*504的帖子

我只是用这个java代码检查URL的链接吗？

我有一个方法,它接收URL并找到该页面上的所有链接.但是我担心如果只检查链接,就像我检查链接是否正常工作一样,有些链接看起来很奇怪.例如,如果我检查www.google.com上的链接,我会得到6个断开的链接,这些链接不返回http状态代码,而是说该链接断开了"没有协议".我只是不认为谷歌会在其主页上有任何断开的链接.其中一个损坏的链接的示例是:/ preferences？hl = zh_我无法在google主页上看到此链接的位置.我很好奇,如果我只检查链接或是否有可能我提取不应该是链接的代码？

以下是检查链接的URL的方法:

public static List getLinks(String uriStr) {

    List result = new ArrayList<String>();
    //create a reader on the html content
    try{
        System.out.println("in the getlinks try");
    URL url = new URI(uriStr).toURL();
    URLConnection conn = url.openConnection();
    Reader rd = new InputStreamReader(conn.getInputStream());

    // Parse the HTML
    EditorKit kit = new HTMLEditorKit();
    HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
    kit.read(rd, doc, 0);

    // Find all the A elements in the HTML document
    HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
    while (it.isValid()) {
        SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();

        String …

Run Code Online (Sandbox Code Playgroud)

java url hyperlink

use*_*504

2013 05-09

5
推荐指数

1
解决办法

379
查看次数

标签统计

hyperlink ×1

java ×1

url ×1

我只是用这个java代码检查URL的链接吗？

标签 统计

小编use_504的帖子

标签统计