使用php获取字符串中的所有URL

Bil*_*ill 4 php regex url

我正试图找出一种从一串文本中获取URL数组的方法.文本将在某种程度上格式化如下:

这里有一些随机文字

http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphones-bezel-a-massive-notification-light/?grcc=88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2=835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033fdeed202~510f37324b14c50a5e9121f955fac3fa 〜1342747216490〜0〜0〜0〜0〜0〜0〜0〜0〜7〜3〜

http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tickets-for-disrupt-sf/

显然,这些链接可以是任何东西(并且可以有许多链接,这些只是我现在正在测试的那些.如果我使用像我的正则表达式这样的简单URL工作正常.

我在用:

preg_match_all('((https?|ftp|gopher|telnet|file|notes|ms-help):'.
    '((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)',
    $bodyMessage, $matches, PREG_PATTERN_ORDER);
Run Code Online (Sandbox Code Playgroud)

当我做一个print_r( $matches);结果时,我得到的是:

Array ( [0] => Array (
    [0] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon=
    [1] => http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= 
    [2] => http://techcrunch.co=
    [3] => http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-ip= 
    [4] => http://techcrunch.com/2012/07/20/last-day-to-purc=
    [5] => http://tec=
)
...
Run Code Online (Sandbox Code Playgroud)

该数组中的所有项目都不是上述链接的完整链接.

任何人都知道获得我需要的好方法吗?我发现了一堆正则表达式的东西来获取PHP的链接,但它都不起作用.

谢谢!

编辑:

好的,所以我从电子邮件中提取这些链接.该脚本解析电子邮件,抓取邮件正文,然后尝试从中获取链接.调查电子邮件后,似乎是出于某种原因在网址中间添加了一个空格.这是我的PHP脚本看到的正文消息的输出.

 --00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 
Run Code Online (Sandbox Code Playgroud)

有关如何使其不破坏URL的任何建议?

编辑2

根据Laurnet的建议,我运行了这段代码:

 $bodyMessage = str_replace("= ", "",$bodyMessage);
Run Code Online (Sandbox Code Playgroud)

然而,当我回应它时,它似乎不想替换"="

 --00248c711bb99ca36d04c54ba5c6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable http://techcrunch.com/2012/07/20/kickstarter-flashr-wants-to-make-the-iphon= es-bezel-a-massive-notification-light/?grcc=3D88888Z0ZwdgtZ0Z0Z0Z0Z0&grcc2= =3D835637c33f965e6cdd34c87219233711~1342828462249~fca4fa8af1286d8a77f26033f= deed202~510f37324b14c50a5e9121f955fac3fa~1342747216490~0~0~0~0~0~0~0~0~7~3~ http://techcrunch.com/2012/07/20/last-day-to-purchase-extra-early-bird-tick= ets-for-disrupt-sf/ --00248c711bb99ca36d04c54ba5c6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable 
Run Code Online (Sandbox Code Playgroud)

Esw*_*ala 9

    /**
     *
     * @get URLs from string (string maybe a url)
     *
     * @param string $string

     * @return array
     *
     */
    function getUrls($string) {
        $regex = '/https?\:\/\/[^\" ]+/i';
        preg_match_all($regex, $string, $matches);
        //return (array_reverse($matches[0]));
        return ($matches[0]);
}
Run Code Online (Sandbox Code Playgroud)