无法网页抓取 html 表漂亮的汤

Cry*_*wan 1 python beautifulsoup web-scraping

尝试从这里废弃 IPO 表数据: https://www.iposcoop.com/last-12-months/

这是我的代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.iposcoop.com/last-12-months/'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table1 = soup.find("table",id='DataTables_Table_0')
table1_data = table1.tbody.find_all("tr")
table1
Run Code Online (Sandbox Code Playgroud)

但是,table1 是 NonType。这是为什么?有什么解决办法吗?我读过相关问题,iframe 似乎不是答案。

Md.*_*que 5

您可以使用 pandas 获取表数据

import pandas as pd
import requests 
from bs4 import BeautifulSoup

url='https://www.iposcoop.com/last-12-months'
req=requests.get(url).text
soup=BeautifulSoup(req,'lxml')
table=soup.select_one('.standard-table.ipolist')
table_data =pd.read_html(str(table))[0]
print(table_data)
Run Code Online (Sandbox Code Playgroud)

输出:

                 Company  Symbol  ...   Return SCOOP Rating
0                                         Akanda Corp.    AKAN  ...   85.00%          S/O     
1    The Marygold Companies, Inc. (aka Concierge Te...    MGLD  ...    9.50%          S/O     
2                            Blue Water Vaccines, Inc.     BWV  ...  343.33%          S/O     
3            Meihua International Medical Technologies    MHUA  ...  -33.00%          S/O     
4                                        Vivakor, Inc.    VIVK  ...  -49.40%          S/O     
..                                                 ...     ...  ...      ...          ...     
355                Khosla Ventures Acquisition Co. III    KVSC  ...   -2.80%          S/O     
356           Dragoneer Growth Opportunities Corp. III    DGNU  ...   -2.40%          S/O     
357                                        Movano Inc.    MOVE  ...  -43.60%          S/O     
358         Supernova Partners Acquisition Company III  STRE.U  ...    0.10%          S/O     
359                           Universe Pharmaceuticals     UPC  ...  -74.00%          S/O     

[360 rows x 10 columns]
Run Code Online (Sandbox Code Playgroud)