如何从电子邮件地址中提取“域”

Question

如何从电子邮件地址中提取“域”

我在专栏中有以下模式

xyz@gmail.com
abc@hotmail.com

Run Code Online (Sandbox Code Playgroud)

现在，我想@在.gmail和hotmail 之前和之后提取文本。我可以.使用以下代码提取文本。

sub(".*@", "", email)

Run Code Online (Sandbox Code Playgroud)

如何在上面进行修改以适合我的用例？

Answer 1

hrb*_*str 6

您：

确实需要阅读RFC 3696的第3节（TLDR：@可以出现在多个位置）
似乎没有考虑过电子邮件可以是“ someone@department.example.com”，“ someone.else@yet.another.department.example.com”（即，在此分析中的某个时候，天真的假设只有一个域可能会再次咬住您）
请注意，如果您确实要查找电子邮件“域名”，那么还必须考虑真正构成域名和适当后缀的内容。

因此- 除非您确定自己拥有并且始终会有简单的电子邮件地址，否则我建议：

library(stringi)
library(urltools)
library(dplyr)
library(purrr)

emails <- c("yz@gmail.com", "abc@hotmail.com",
            "someone@department.example.com",
            "someone.else@yet.another.department.com",
            "some.brit@froodyorg.co.uk")

stri_locate_last_fixed(emails, "@")[,"end"] %>%
  map2_df(emails, function(x, y) {
    substr(y, x+1, nchar(y)) %>%
      suffix_extract()
  })
##                         host    subdomain      domain suffix
## 1                  gmail.com         <NA>       gmail    com
## 2                hotmail.com         <NA>     hotmail    com
## 3      deparment.example.com   department     example    com
## 4 yet.another.department.com  yet.another  department    com
## 5             froodyco.co.uk         <NA>   froodyorg  co.uk

Run Code Online (Sandbox Code Playgroud)

请注意子域，域和后缀的正确分割，尤其是对于最后一个。

知道这一点后，我们可以将代码更改为：

stri_locate_last_fixed(emails, "@")[,"end"] %>%
  map2_chr(emails, function(x, y) {
    substr(y, x+1, nchar(y)) %>%
      suffix_extract() %>%
      mutate(full_domain=ifelse(is.na(subdomain), domain, sprintf("%s.%s", subdomain, domain))) %>%
      select(full_domain) %>%
      flatten_chr()
  })
## [1] "gmail"                   "hotmail"               
## [3] "department.example"      "yet.another.department"
## [5] "froodyorg"

Run Code Online (Sandbox Code Playgroud)

Answer 2

akr*_*run 5

我们可以用gsub

gsub(".*@|\\..*", "", email)
#[1] "gmail"   "hotmail"

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，3 月前
查看次数：	2521 次
最近记录：	9 年，3 月前