如何下载所有域 WHOIS 数据?

rde*_*ges 5 dns

我正在编写一些分析注册域名并寻找趋势的软件。我正在尝试一些机器学习,以帮助根据正在注册的域类型来预测未来将购买哪些域名。

我一直在寻找一种方法来下载“所有”存在的已注册域,但我一直找不到这样做的方法。

我很容易使用whois命令行工具查询单个域名,例如:

$ whois google.com
   Domain Name: GOOGLE.COM
   Registry Domain ID: 2138514_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.markmonitor.com
   Registrar URL: http://www.markmonitor.com
   Updated Date: 2018-02-21T18:36:40Z
   Creation Date: 1997-09-15T04:00:00Z
   Registry Expiry Date: 2020-09-14T04:00:00Z
   Registrar: MarkMonitor Inc.
   Registrar IANA ID: 292
   Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
   Registrar Abuse Contact Phone: +1.2083895740
   Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
   Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
   Domain Status: serverDeleteProhibited https://icann.org/epp#serverDeleteProhibited
   Domain Status: serverTransferProhibited https://icann.org/epp#serverTransferProhibited
   Domain Status: serverUpdateProhibited https://icann.org/epp#serverUpdateProhibited
   Name Server: NS1.GOOGLE.COM
   Name Server: NS2.GOOGLE.COM
   Name Server: NS3.GOOGLE.COM
   Name Server: NS4.GOOGLE.COM
   DNSSEC: unsigned
   URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
>>> Last update of whois database: 2018-03-20T03:16:59Z <<<

For more information on Whois status codes, please visit https://icann.org/epp

NOTICE: The expiration date displayed in this record is the date the
registrar's sponsorship of the domain name registration in the registry is
currently set to expire. This date does not necessarily reflect the expiration
date of the domain name registrant's agreement with the sponsoring
registrar.  Users may consult the sponsoring registrar's Whois database to
view the registrar's reported date of expiration for this registration.

TERMS OF USE: You are not authorized to access or query our Whois
database through the use of electronic processes that are high-volume and
automated except as reasonably necessary to register domain names or
modify existing registrations; the Data in VeriSign Global Registry
Services' ("VeriSign") Whois database is provided by VeriSign for
information purposes only, and to assist persons in obtaining information
about or related to a domain name registration record. VeriSign does not
guarantee its accuracy. By submitting a Whois query, you agree to abide
by the following terms of use: You agree that you may use this Data only
for lawful purposes and that under no circumstances will you use this Data
to: (1) allow, enable, or otherwise support the transmission of mass
unsolicited, commercial advertising or solicitations via e-mail, telephone,
or facsimile; or (2) enable high volume, automated, electronic processes
that apply to VeriSign (or its computer systems). The compilation,
repackaging, dissemination or other use of this Data is expressly
prohibited without the prior written consent of VeriSign. You agree not to
use electronic processes that are automated and high-volume to access or
query the Whois database except as reasonably necessary to register
domain names or modify existing registrations. VeriSign reserves the right
to restrict your access to the Whois database in its sole discretion to ensure
operational stability.  VeriSign may restrict or terminate your access to the
Whois database for failure to abide by these terms of use. VeriSign
reserves the right to modify these terms at any time.

The Registry database contains ONLY .COM, .NET, .EDU domains and
Registrars.
Domain Name: google.com
Registry Domain ID: 2138514_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2018-02-21T10:45:07-0800
Creation Date: 1997-09-15T00:00:00-0700
Registrar Registration Expiration Date: 2020-09-13T21:00:00-0700
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
Registrar Abuse Contact Phone: +1.2083895740
Domain Status: clientUpdateProhibited (https://www.icann.org/epp#clientUpdateProhibited)
Domain Status: clientTransferProhibited (https://www.icann.org/epp#clientTransferProhibited)
Domain Status: clientDeleteProhibited (https://www.icann.org/epp#clientDeleteProhibited)
Domain Status: serverUpdateProhibited (https://www.icann.org/epp#serverUpdateProhibited)
Domain Status: serverTransferProhibited (https://www.icann.org/epp#serverTransferProhibited)
Domain Status: serverDeleteProhibited (https://www.icann.org/epp#serverDeleteProhibited)
Registry Registrant ID: 
Registrant Name: Domain Administrator
Registrant Organization: Google LLC
Registrant Street: 1600 Amphitheatre Parkway, 
Registrant City: Mountain View
Registrant State/Province: CA
Registrant Postal Code: 94043
Registrant Country: US
Registrant Phone: +1.6502530000
Registrant Phone Ext: 
Registrant Fax: +1.6502530001
Registrant Fax Ext: 
Registrant Email: dns-admin@google.com
Registry Admin ID: 
Admin Name: Domain Administrator
Admin Organization: Google LLC
Admin Street: 1600 Amphitheatre Parkway, 
Admin City: Mountain View
Admin State/Province: CA
Admin Postal Code: 94043
Admin Country: US
Admin Phone: +1.6502530000
Admin Phone Ext: 
Admin Fax: +1.6502530001
Admin Fax Ext: 
Admin Email: dns-admin@google.com
Registry Tech ID: 
Tech Name: Domain Administrator
Tech Organization: Google LLC
Tech Street: 1600 Amphitheatre Parkway, 
Tech City: Mountain View
Tech State/Province: CA
Tech Postal Code: 94043
Tech Country: US
Tech Phone: +1.6502530000
Tech Phone Ext: 
Tech Fax: +1.6502530001
Tech Fax Ext: 
Tech Email: dns-admin@google.com
Name Server: ns1.google.com
Name Server: ns4.google.com
Name Server: ns2.google.com
Name Server: ns3.google.com
DNSSEC: unsigned
URL of the ICANN WHOIS Data Problem Reporting System: http://wdprs.internic.net/
>>> Last update of WHOIS database: 2018-03-19T20:13:36-0700 <<<

The Data in MarkMonitor.com's WHOIS database is provided by MarkMonitor.com for
information purposes, and to assist persons in obtaining information about or
related to a domain name registration record.  MarkMonitor.com does not guarantee
its accuracy.  By submitting a WHOIS query, you agree that you will use this Data
only for lawful purposes and that, under no circumstances will you use this Data to:
 (1) allow, enable, or otherwise support the transmission of mass unsolicited,
     commercial advertising or solicitations via e-mail (spam); or
 (2) enable high volume, automated, electronic processes that apply to
     MarkMonitor.com (or its systems).
MarkMonitor.com reserves the right to modify these terms at any time.
By submitting this query, you agree to abide by this policy.

MarkMonitor is the Global Leader in Online Brand Protection.

MarkMonitor Domain Management(TM)
MarkMonitor Brand Protection(TM)
MarkMonitor AntiPiracy(TM)
MarkMonitor AntiFraud(TM)
Professional and Managed Services

Visit MarkMonitor at http://www.markmonitor.com
Contact us at +1.8007459229
In Europe, at +44.02032062220

For more information on Whois status codes, please visit
 https://www.icann.org/resources/pages/epp-status-codes-2014-06-16-en
--
Run Code Online (Sandbox Code Playgroud)

WHOIS 数据包含我需要的一切,但我找不到下载所有当前注册域的 WHOIS 数据的方法。

有什么方法可以让我获得这些数据吗?我觉得它必须在某个地方公开可用,因为whoisCLI 工具可以很容易地查询信息。

我在这里缺少什么?

Pat*_*zek 6

TL;DR:您不能(下载所有“whois”数据)。

(旁注:“whois 数据”,虽然经常使用有点不正确。你使用 whois 协议和 whois 客户端来查询注册局的 whois 服务器,更具体地说,这里是一个域名注册局,存储有关的联系数据它赞助的域名。出于同样的原因,没有“whois 数据库”。)

现在是漫长的悲伤故事:

由于许多明显的技术和非技术原因,这是不可能的。而你深深错了,如果你觉得whoisCLI命令很简单(见我的其他答案在这里:https://unix.stackexchange.com/a/407030/211833的详细信息,这一点)

首先,您的问题对所有 TLD 都没有意义。您至少必须将 ccTLD 与 gTLD 分开。

1) ccTLD

ccTLD 通常对个人数据的隐私有更严格的规定,这应该与现行的欧洲法规(如 GDPR)更加严格。基本上,其中一些已经禁止访问没有个人数据的完整域名列表(通常称为“区域文件”),因此您无法访问所有内容和个人数据. 您可能会尝试接近一些并询问是否有任何可能的研究,但我怀疑您是否会成功,并且您需要分别处理每个 ccTLD 注册管理机构,因为每个注册管理机构都处理自己的内容(域上的所有数据)他们管理的 TLD 中的名称)

2)通用顶级域名

对他们来说,情况完全不同。

首先,由于默认情况下事情更加自由(不保护个人数据),您会看到许多注册商/公司提供代理/隐私服务,这意味着即使在 whois 查询输出中,您也不会看到太多有用的数据。

但仍然由于 GDPR 和同化,事情正在发生变化。godaddy.com例如,做一个 whois并观看所有这些明星的联系人姓名和电子邮件,因此需要访问一个网站。

但是,注册商和注册管理机构与 ICANN 签订了合同。这意味着他们都有一些要求,而且是统一的。

首先,所有注册管理机构都被授权访问其区域文件。它通常通过 CZDA 完成,您可以在 ICANN 网站上找到详细信息。请注意,它实际上是发布的所有域名列表,而不是所有已注册域名的列表,因为您可以注册域名而不将其放在 DNS 上可见。

至于联系人数据,即whois中可见的其余信息,还有其他不为人知的点。请参阅https://www.icann.org/resources/pages/approved-with-specs-2013-09-17-en 上的注册商协议,特别是第 3.3.6 节,提供对注册商“whois”数据的批量访问。请注意它是如何与一些钱(10 000 美元)挂钩的,并且对你可以用它做什么有各种限制。请记住,您需要为每个注册商执行此操作,因此在 gTLD 世界中,注册商数量超过 1000 个。

注册管理机构协议中没有针对公共批量访问的等效条款(请参阅https://newgtlds.icann.org/sites/default/files/agreements/agreement-approved-31jul17-en.html)。

事情很复杂,因为直到今天和几个月以来,它.COM/.NET仍然是一个瘦小的注册机构,没有存储在注册机构级别的联系数据,仅在注册商处。

此外,由于新规定以及新协议 RDAP 将在未来几个月/几年内取代 whois,上述所有内容都将发生变化。RDAP 将允许对给定的访问和返回的数据量进行更高级别的粒度。

当然,在上述所有情况下,从技术上讲,没有任何内容禁止任何人进行常规的 whois 查询并将结果存储在本地。正如您在 whois 输出中看到的那样,您对数据的使用受到各种限制的限制,批量查询 whois 服务器总是使您面临被列入黑名单或至少严重限制速率的风险。请注意,对于输入(查询 whois 服务器的名称),很容易从区域文件开始,甚至跨 TLD(如果site.example存在,site.test即使您没有.test区域文件,也可以尝试),或搜索引擎查询或字典, 等等。

多家公司都这样做并提供工具来搜索他们的数据,比如进行反向查询之类的。也许有些可以为您提供大量结果,但肯定不是免费的。