如何在Python中查找相对URL并将其转换为绝对URL

luc*_*cas 2 python regex

我从网页(http://www.opensolaris.org/os/community/on/flag-days/all/)中提取一些代码,如下所示,

<tr class="build">
  <th colspan="0">Build 110</th>
</tr>
<tr class="arccase project flagday">
  <td>Feb-25</td>
  <td></td>
  <td></td>
  <td></td>
  <td>
    <a href="../pages/2009022501/">Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br />
    cpupm keyword mode extensions - <a href="/os/community/arc/caselog/2008/777/">PSARC/2008/777</a><br />
    CPU Deep Idle Keyword - <a href="/os/community/arc/caselog/2008/663/">PSARC/2008/663</a><br />
  </td>
</tr>
Run Code Online (Sandbox Code Playgroud)

并且它中有一些相对的url路径,现在我想用正则表达式搜索它并用绝对url路径替换它们.因为我知道urljoin可以做那样的替换工作,

>>> urljoin("http://www.opensolaris.org/os/community/on/flag-days/all/",
...         "/os/community/arc/caselog/2008/777/")
'http://www.opensolaris.org/os/community/arc/caselog/2008/777/'
Run Code Online (Sandbox Code Playgroud)

现在我想知道如何使用正则表达式搜索它们,最后将代码转换为,

<tr class="build">
  <th colspan="0">Build 110</th>
</tr>
<tr class="arccase project flagday">
  <td>Feb-25</td>
  <td></td>
  <td></td>
  <td></td>
  <td>
    <a href="http://www.opensolaris.org/os/community/on/flag-days/all//pages/2009022501/">Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br />
    cpupm keyword mode extensions - <a href="http://www.opensolaris.org/os/community/arc/caselog/2008/777/">PSARC/2008/777</a><br />
    CPU Deep Idle Keyword - <a href="http://www.opensolaris.org/os/community/arc/caselog/2008/663/">PSARC/2008/663</a><br />
  </td>
</tr>
Run Code Online (Sandbox Code Playgroud)

我对正则表达式的了解非常少,我想知道如何做到这一点.谢谢

我已经完成了使用Beautiful Soup的工作,haha~Thx适合所有人!

Ion*_*tan 6

我不确定你想要实现什么,但在HTML中使用BASE标签可以为你做这个技巧,而不必在进行处理时使用正则表达式.