获取外部网页图像的绝对路径

Roh*_*hit 0 html php dom

我正在使用bookmarklet,我正在使用HTML DOM解析器获取任何外部页面的所有照片(如前面的回答所提出的那样).我正确地提取照片并显示在我的书签中弹出.但我对照片的相对路径有问题.

例如,外部页面上的照片来源说http://www.example.com/dir/index.php

  1. 照片来源1:img source ='hostname/photos/photo.jpg' - 获取照片,因为它是绝对的

  2. 照片来源2:img source ='/ photos/photo.jpg' - 没有得到,因为它不是绝对的.

我通过当前的url工作,我的意思是使用dirname或pathinfo来获取当前url的目录.但导致host/dir /(主机作为父目录)和host/dir/index.php(host/dir作为父目录,这是正确的)之间的问题

请帮忙我如何获得这些相关照片?

Dav*_*dom 5

FIXED(添加了对仅查询字符串图像路径的支持)

function make_absolute_path ($baseUrl, $relativePath) {

    // Parse URLs, return FALSE on failure
    if ((!$baseParts = parse_url($baseUrl)) || (!$pathParts = parse_url($relativePath))) {
        return FALSE;
    }

    // Work-around for pre- 5.4.7 bug in parse_url() for relative protocols
    if (empty($baseParts['host']) && !empty($baseParts['path']) && substr($baseParts['path'], 0, 2) === '//') {
        $parts = explode('/', ltrim($baseParts['path'], '/'));
        $baseParts['host'] = array_shift($parts);
        $baseParts['path'] = '/'.implode('/', $parts);
    }
    if (empty($pathParts['host']) && !empty($pathParts['path']) && substr($pathParts['path'], 0, 2) === '//') {
        $parts = explode('/', ltrim($pathParts['path'], '/'));
        $pathParts['host'] = array_shift($parts);
        $pathParts['path'] = '/'.implode('/', $parts);
    }

    // Relative path has a host component, just return it
    if (!empty($pathParts['host'])) {
        return $relativePath;
    }

    // Normalise base URL (fill in missing info)
    // If base URL doesn't have a host component return error
    if (empty($baseParts['host'])) {
        return FALSE;
    }
    if (empty($baseParts['path'])) {
        $baseParts['path'] = '/';
    }
    if (empty($baseParts['scheme'])) {
        $baseParts['scheme'] = 'http';
    }

    // Start constructing return value
    $result = $baseParts['scheme'].'://';

    // Add username/password if any
    if (!empty($baseParts['user'])) {
        $result .= $baseParts['user'];
        if (!empty($baseParts['pass'])) {
            $result .= ":{$baseParts['pass']}";
        }
        $result .= '@';
    }

    // Add host/port
    $result .= !empty($baseParts['port']) ? "{$baseParts['host']}:{$baseParts['port']}" : $baseParts['host'];

    // Inspect relative path path
    if ($relativePath[0] === '/') {

        // Leading / means from root
        $result .= $relativePath;

    } else if ($relativePath[0] === '?') {

        // Leading ? means query the existing URL
        $result .= $baseParts['path'].$relativePath;

    } else {

        // Get the current working directory
        $resultPath = rtrim(substr($baseParts['path'], -1) === '/' ? trim($baseParts['path']) : str_replace('\\', '/', dirname(trim($baseParts['path']))), '/');

        // Split the image path into components and loop them
        foreach (explode('/', $relativePath) as $pathComponent) {
            switch ($pathComponent) {
                case '': case '.':
                    // a single dot means "this directory" and can be skipped
                    // an empty space is a mistake on somebodies part, and can also be skipped
                    break;
                case '..':
                     // a double dot means "up a directory"
                    $resultPath = rtrim(str_replace('\\', '/', dirname($resultPath)), '/');
                    break;
                default:
                    // anything else can be added to the path
                    $resultPath .= "/$pathComponent";
                    break;
            }
        }

        // Add path to result
        $result .= $resultPath;

    }

    return $result;

}
Run Code Online (Sandbox Code Playgroud)

测试:

echo make_absolute_path('http://www.example.com/dir/index.php','/photos/photo.jpg')."\n";
// Outputs: http://www.example.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','photos/photo.jpg')."\n";
// Outputs: http://www.example.com/dir/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','./photos/photo.jpg')."\n";
// Outputs: http://www.example.com/dir/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','../photos/photo.jpg')."\n";
// Outputs: http://www.example.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','http://www.yyy.com/photos/photo.jpg')."\n";
// Outputs: http://www.yyy.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','?query=something')."\n";
// Outputs: http://www.example.com/dir/index.php?query=something
Run Code Online (Sandbox Code Playgroud)

我认为这应该处理你可能正确遇到的所有事情,并且应该大致等同于浏览器使用的逻辑.还应该纠正你可能在Windows上使用带有杂散正斜杠的任何奇怪之处dirname().

第一个参数是您找到(或其他)的页面的完整 URL,第二个参数是/ etc属性的内容.<img><a>srchref

如果有人发现不起作用的东西(因为我知道你们都会试图打破它:-D),请告诉我,我会尝试修复它.