使用 <a href="javascript:linkTo_UnCryptMailto(%27ocknvq%2Cjgkmg0qdgtnkpBwpk%5C%2Fvwgdkpigp0fg%27) 获取 mailto

Giu*_*lia 2 r web-scraping rvest

我想从此链接中提取带有 rvest 的电子邮件 但是有一个 javascript 屏蔽了 mailto href

我该如何改进以下代码?

 uni<- c("https://uni-tuebingen.de/fakultaeten/philosophische-fakultaet/fachbereiche/asien-orient-wissenschaften/indologie/mitarbeiter/")
  r<-read_html(uni) 
  a <- r %>%
    html_nodes("a") %>%
    html_attrs() %>%
    as.character() %>%
    str_subset("mailto:") %>%
    str_remove("mailto:")
Run Code Online (Sandbox Code Playgroud)

提前致谢

Ina*_*Haq 5

def decryptCharcode(n, start, end, offset):
    n = ord(n) + offset
    if (offset > 0 and n > end):
        n = start + (n - end - 1)
    elif (offset < 0 and n < start):
        n = end - (start - n - 1)

    return ''.join(map(chr, [n]))



def decryptString(enc, offset):

    dec = ""

    length = len(enc)

    for i in range(length-3):

        n = enc[i]
        if (0x2B <= ord(n) <= 0x3A):
            dec += decryptCharcode(n, 0x2B, 0x3A, offset)
        elif 0x40 <= ord(n) <= 0x5A:
            dec += decryptCharcode(n, 0x40, 0x5A, offset)
        elif (0x61 <= ord(n) <= 0x7A):
            dec += decryptCharcode(n, 0x61, 0x7A, offset)
        else:
            dec += enc[i]

    return dec

email = "%27ocknvq%2Cuvqemgt0ygtpgtBdnwgykp0ej%27" 
if "%27ocknvq%2C" in email:
    email = email.replace("%27ocknvq%2C","") 
email = decryptString(email,-2)

if "%3A%0D" in email:
    email=email.replace("%3A%0D","-") 
Run Code Online (Sandbox Code Playgroud)

打印(电子邮件)

我把JS代码转成了python。参考:https://gist.github.com/InsanityMeetsHH/c38f513f28d6f9b778912f110c565348