Python和re.compile返回不一致的结果

cyr*_*rus 1 python regex

我试图取代的所有实例href="../directory"href="../directory/index.html".

在Python中,这个

reg = re.compile(r'<a href="../(.*?)">')
for match in re.findall(reg, input_html):
    output_html = input_html.replace(match, match+'index.html')
Run Code Online (Sandbox Code Playgroud)

产生以下输出:

href="../personal-autonomy/index.htmlindex.htmlindex.htmlindex.html"  
href="../paternalism/index.html"  
href="../principle-beneficence/index.htmlindex.htmlindex.html"  
href="../decision-capacity/index.htmlindex.htmlindex.html" 
Run Code Online (Sandbox Code Playgroud)

知道为什么它适用于第二个链接,但其他链接不适用?

相关部分来源:

<p> 

 <a href="../personal-autonomy/">autonomy: personal</a> |
 <a href="../principle-beneficence/">beneficence, principle of</a> |
 <a href="../decision-capacity/">decision-making capacity</a> |
 <a href="../legal-obligation/">legal obligation and authority</a> |
 <a href="../paternalism/">paternalism</a> |
 <a href="../identity-personal/">personal identity</a> |
 <a href="../identity-ethics/">personal identity: and ethics</a> |
 <a href="../respect/">respect</a> |
 <a href="../well-being/">well-being</a> 

</p> 
Run Code Online (Sandbox Code Playgroud)

编辑:重复的'index.html'实际上是多个匹配的结果.(例如,href ="../ personal-autonomy/index.htmlindex.htmlindex.htmlindex.html"是因为../personal-autonomy在原始源中被找到四次).

作为一般的正则表达式问题,如何在不向所有匹配项添加额外"index.html"的情况下替换所有实例?

jfs*_*jfs 5

不要用正则表达式解析html:

import re    
from lxml import html

def replace_link(link):
    if re.match(r"\.\./[^/]+/$", link):
        link += "index.html"
    return link

print html.rewrite_links(your_html_text, replace_link)
Run Code Online (Sandbox Code Playgroud)

产量

<p> 

 <a href="../personal-autonomy/index.html">autonomy: personal</a> |
 <a href="../principle-beneficence/index.html">beneficence, principle of</a> |
 <a href="../decision-capacity/index.html">decision-making capacity</a> |
 <a href="../legal-obligation/index.html">legal obligation and authority</a> |
 <a href="../paternalism/index.html">paternalism</a> |
 <a href="../identity-personal/index.html">personal identity</a> |
 <a href="../identity-ethics/index.html">personal identity: and ethics</a> |
 <a href="../respect/index.html">respect</a> |
 <a href="../well-being/index.html">well-being</a> 

</p>
Run Code Online (Sandbox Code Playgroud)