我从网页(http://www.opensolaris.org/os/community/on/flag-days/all/)中提取一些代码,如下所示,
<tr class="build">
<th colspan="0">Build 110</th>
</tr>
<tr class="arccase project flagday">
<td>Feb-25</td>
<td></td>
<td></td>
<td></td>
<td>
<a href="../pages/2009022501/">Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br />
cpupm keyword mode extensions - <a href="/os/community/arc/caselog/2008/777/">PSARC/2008/777</a><br />
CPU Deep Idle Keyword - <a href="/os/community/arc/caselog/2008/663/">PSARC/2008/663</a><br />
</td>
</tr>
Run Code Online (Sandbox Code Playgroud)
并且它中有一些相对的url路径,现在我想用正则表达式搜索它并用绝对url路径替换它们.因为我知道urljoin可以做那样的替换工作,
>>> urljoin("http://www.opensolaris.org/os/community/on/flag-days/all/",
... "/os/community/arc/caselog/2008/777/")
'http://www.opensolaris.org/os/community/arc/caselog/2008/777/'
Run Code Online (Sandbox Code Playgroud)
现在我想知道如何使用正则表达式搜索它们,最后将代码转换为,
<tr class="build">
<th colspan="0">Build 110</th>
</tr>
<tr class="arccase project flagday">
<td>Feb-25</td>
<td></td>
<td></td>
<td></td>
<td>
<a href="http://www.opensolaris.org/os/community/on/flag-days/all//pages/2009022501/">Flag Day and Heads Up: Power Aware Dispatcher and Deep C-States</a><br />
cpupm keyword mode extensions - <a href="http://www.opensolaris.org/os/community/arc/caselog/2008/777/">PSARC/2008/777</a><br />
CPU Deep Idle Keyword - <a href="http://www.opensolaris.org/os/community/arc/caselog/2008/663/">PSARC/2008/663</a><br />
</td>
</tr>
Run Code Online (Sandbox Code Playgroud)
我对正则表达式的了解非常少,我想知道如何做到这一点.谢谢
我已经完成了使用Beautiful Soup的工作,haha~Thx适合所有人!