从 Python Beautifulsoup 中抓取表格

Ste*_*Kim 0 python beautifulsoup web-scraping google-colaboratory

我试图从这个网站刮表:https : //stockrow.com/VRTX/financials/income/quarterly

我正在使用 Python Google Colab,我希望将日期作为列。(例如 2020-06-30 等)我用代码来做这样的事情:

source = urllib.request.urlopen('https://stockrow.com/VRTX/financials/income/quarterly').read()
soup = bs.BeautifulSoup(source,'lxml')
table = soup.find_all('table')
Run Code Online (Sandbox Code Playgroud)

但是,我无法拿到桌子。我对抓取有点陌生,所以我查看了其他 Stackoverflow 页面,但无法解决问题。你能帮我么?那将不胜感激。

And*_*ely 5

您可以使用他们的 API 来加载数据:

import requests
import pandas as pd


indicators_url = 'https://stockrow.com/api/indicators.json'
data_url = 'https://stockrow.com/api/companies/VRTX/financials.json?ticker=VRTX&dimension=Q&section=Income+Statement'

indicators = {i['id']: i for i in requests.get(indicators_url).json()}
all_data = []
for d in requests.get(data_url).json():
    d['id'] = indicators[d['id']]['name']
    all_data.append(d)

df = pd.DataFrame(all_data)
df.to_csv('data.csv')
print(df)
Run Code Online (Sandbox Code Playgroud)

印刷:

                                     id    2020-06-30    2020-03-31    2019-12-31   2019-09-30   2019-06-30  ...   2011-12-31   2011-09-30    2011-06-30    2011-03-31    2010-12-31    2010-09-30
0          Consolidated Net Income/Loss   837270000.0   602753000.0   583234100.0   57518000.0  267427000.0  ...  188141000.0  228452000.0  -199318000.0  -176096000.0  -180392000.0  -208957000.0
1      EPS (Basic, from Continuous Ops)        3.2248        2.3199        2.2654       0.2239        1.044  ...       0.9374        1.109       -0.9751       -0.8703       -0.8966       -1.0402
2                     Net Profit Margin        0.5492        0.3978        0.4127       0.0606       0.2841  ...       0.2816       0.3354       -1.5213       -2.3906       -2.7531       -8.7816
3                          Gross Profit  1339965000.0  1352610000.0  1228253000.0  817914000.0  805553000.0  ...  533213000.0  620794000.0   105118000.0    70996000.0    62475000.0    20567000.0
4                  Income Tax Provision   -12500000.0    54781000.0    93716000.0   13148000.0   59711000.0  ...   22660000.0  -27842000.0    24448000.0           0.0           NaN           0.0
5                      Operating Income   718033000.0   720224100.0   551464400.0   99333000.0  269960000.0  ...  223901900.0  215707000.0  -165890000.0  -159899000.0  -166634000.0  -199588000.0
6                                  EBIT   718033000.0   720224100.0   551464700.0   99333000.0  269960000.0  ...  223901900.0  215707000.0  -165890000.0  -159899000.0  -166634000.0  -199588000.0
7         EPS (Diluted, from Cont. Ops)        3.1787        2.2874        2.2319       0.2208       1.0293  ...       1.0011       1.0415       -0.9751       -0.8703       -0.8966       -1.0402
8                                EBITDA   744730000.0   747045000.0   577720400.0  125180000.0  297658000.0  ...  233625900.0  223457000.0  -157181000.0  -151041000.0  -158429000.0  -192830000.0
9             EPS (Basic, Consolidated)        3.2248        2.3199        2.2654       0.2239        1.044  ...       0.9374        1.109       -0.9751       -0.8703       -0.8966       -1.0402
10                                  EBT   824770000.0   657534000.0   676950000.0   70666000.0  327138000.0  ...  210801000.0  200610000.0  -174870000.0  -176096000.0  -180392000.0  -208957000.0
11           Operating Cash Flow Margin        0.6812        0.5384        0.3156       0.3525       0.4927  ...       0.8941       0.0651       -1.8894       -2.5336        -2.535       -6.8918
12                           EBT margin         0.541         0.434         0.479       0.0744       0.3475  ...       0.3742       0.3043       -1.5283       -2.3906       -2.7531       -8.7816
13                          EBIT Margin         0.471        0.4754        0.3902       0.1046       0.2868  ...       0.3975       0.3272       -1.4498       -2.1707       -2.5431       -8.3878
14    Income from Continuous Operations   837270000.0   602753000.0   583234000.0   57518000.0  267427000.0  ...  188141000.0  228452000.0  -199318000.0  -176096000.0  -180392000.0  -208957000.0
15                         R&D Expenses   420928000.0   448528000.0   480011000.0  555948000.0  379091000.0  ...  186438000.0  189052000.0   173604000.0   158612000.0   168888000.0   170434000.0
16      Non-operating Interest Expenses    13871000.0    14136000.0    14249000.0   14548000.0   14837000.0  ...   11659000.0    7059000.0     6962000.0    12001000.0     7686000.0     3951000.0
17                        EBITDA Margin        0.4885        0.4931        0.4088       0.1318       0.3162  ...       0.4147        0.339       -1.3737       -2.0505       -2.4179       -8.1038
18         Non-operating Income/Expense   106737000.0   -62690000.0   125485000.0  -28667000.0   57178000.0  ...  -13101000.0  -15097000.0    -8980000.0   -16197000.0   -13758000.0    -9369000.0
19                          EPS (Basic)          3.22          2.32          2.26         0.22         1.04  ...         0.76         1.06         -0.85         -0.87          -0.9         -1.04
20                         Gross Margin         0.879        0.8927        0.8691       0.8611       0.8558  ...       0.9465       0.9417        0.9187        0.9638        0.9535        0.8643
21                              Revenue  1524485000.0  1515107000.0  1413265000.0  949828000.0  941293000.0  ...  563340000.0  659200000.0   114424000.0    73662000.0    65524000.0    23795000.0
22            Shares (Diluted, Average)   263403000.0   263515000.0   262108000.0  260473000.0  259822000.0  ...  217602000.0  219349000.0   204413000.0   202329000.0   201355000.0   200887000.0
23                      Cost of Revenue   184520000.0   162497000.0   185012000.0  131914000.0  135740000.0  ...   30127000.0   38406000.0     9306000.0     2666000.0     3049000.0     3228000.0
24                        SG&A Expenses   191804000.0   182258000.0   195277000.0  159674000.0  156502000.0  ...  121881000.0  110654000.0    96663000.0    71523000.0    62478000.0    48855000.0
25          EPS (Diluted, Consolidated)        3.1787        2.2874        2.2319       0.2208       1.0293  ...       1.0011       1.0415       -0.9751       -0.8703       -0.8966       -1.0402
26                       Revenue Growth        0.6196         0.765        0.6242       0.2107       0.2515  ...       7.5975      26.7033        2.6185        2.2842        0.9335       -0.0466
27             Shares (Basic, Weighted)   259637000.0   259815000.0   256728000.0  256946000.0  256154000.0  ...  204891000.0  206002000.0   204413000.0   202329000.0   200402000.0   200887000.0
28                     Income after Tax   837270000.0   602753000.0   583234000.0   57518000.0  267427000.0  ...  188141000.0  228452000.0  -199318000.0  -176096000.0  -180392000.0  -208957000.0
29                        EPS (Diluted)          3.18          2.29          2.23         0.22         1.03  ...         0.74         1.02         -0.85         -0.87          -0.9         -1.04
30                    Net Income Common   837270000.0   602753000.0   583234100.0   57518000.0  267427000.0  ...  158629000.0  221110000.0  -174069000.0  -176096000.0  -180392000.0  -208957000.0
31           Shares (Diluted, Weighted)   263403000.0   263515000.0   260673000.0  260473000.0  259822000.0  ...  208807000.0  219349000.0   204413000.0   202329000.0   200402000.0   200887000.0
32             Non-Controlling Interest           NaN           NaN           NaN          NaN          NaN  ...   29512000.0    7342000.0   -25249000.0           0.0           NaN           0.0
33                Dividends (Preferred)           NaN           NaN           NaN          NaN          NaN  ...          NaN          NaN           NaN           NaN           NaN           NaN
34   EPS (Basic, from Discontinued Ops)           NaN           NaN           NaN          NaN          NaN  ...          NaN          NaN           NaN           NaN           NaN           NaN
35        EPS (Diluted, from Disc. Ops)           NaN           NaN           NaN          NaN          NaN  ...          NaN          NaN           NaN           NaN           NaN           NaN
36  Income from Discontinued Operations           NaN           NaN           NaN          NaN          NaN  ...          NaN          NaN           NaN           NaN           NaN           NaN

[37 rows x 41 columns]
Run Code Online (Sandbox Code Playgroud)

并保存data.csv

在此处输入图片说明


或者从该页面下载他们的 XLSX:

url = 'https://stockrow.com/api/companies/VRTX/financials.xlsx?dimension=Q&section=Income%20Statement&sort=desc'

df = pd.read_excel(url)
pd.set_option('display.float_format', lambda x: '%.3f' % x)
print(df)
Run Code Online (Sandbox Code Playgroud)