Python 从字符串列表中删除字符串列表

Len*_*ood 2 python list

我正在尝试从 URL 列表中删除几个字符串。我有超过 30 万个 URL,我正在尝试查找哪些是原始 URL 的变体。这是我一直在使用的一个玩具示例。

URLs = ['example.com/page.html',
        'www.example.com/in/page.html',
        'example.com/ca/fr/page.html',
        'm.example.com/de/page.html',
        'example.com/fr/page.html']

locs = ['/in', '/ca', '/de', '/fr', 'm.', 'www.']
Run Code Online (Sandbox Code Playgroud)

我最终想要的是一个没有语言或位置的页面列表:

desired_output = ['example.com/page.html',
                  'example.com/page.html',
                  'example.com/page.html',
                  'example.com/page.html',
                  'example.com/page.html']
Run Code Online (Sandbox Code Playgroud)

我尝试过列表理解和嵌套 for 循环,但还没有任何效果。有人可以帮忙吗?

# doesn't remove anything
for item in URLs:
    for string in locs:
        re.sub(string, '', item)

# doesn't remove anything
for item in URLs:
    for string in locs:
        item.strip(string)

# only removes the last string in locs
clean = []
for item in URLs:
    for string in locs:
        new = item.replace(string, '')
    clean.append(new)
Run Code Online (Sandbox Code Playgroud)

Dan*_*iel 5

replace您必须再次将结果分配给item

clean = []
for item in URLs:
    for loc in locs:
        item = item.replace(loc, '')
    clean.append(item)
Run Code Online (Sandbox Code Playgroud)

或者简而言之:

clean = [
    reduce(lambda item,loc: item.replace(loc,''), [item]+locs)
    for item in URLs
]
Run Code Online (Sandbox Code Playgroud)