如何使用 python 进行网页抓取时解码 [email\xa0protected]

Question

如何使用 python 进行网页抓取时解码 [email\xa0protected]

当我尝试使用 python lxml.html 从下面的标签中提取邮件 ID 时，它显示 [email\xa0protected]，任何人都可以帮我解码这一点。

<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4420366a373021283e2136042921202d27212a30262520212a6a272b29">[email&#160;protected]</a>

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sri*_*ela 10

最后，我找到了答案：

fp = '4420366a373021283e2136042921202d27212a30262520212a6a272b29' # taken from data-cfemail html attribut which holds encrypted email

    def deCFEmail(fp):
        try:
            r = int(fp[:2],16)
            email = ''.join([chr(int(fp[i:i+2], 16) ^ r) for i in range(2, len(fp), 2)])
            return email
        except (ValueError):
            pass

Run Code Online (Sandbox Code Playgroud)

使用上面的代码，我们可以将 CloudFare 的 base58 值解码为文本。

例子：

s = '4420366a373021283e2136042921202d27212a30262520212a6a272b29'

print(deCFEmail(s))

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，1 月前
查看次数：	2474 次
最近记录：	4 年前