Ste*_*Kim 0 python beautifulsoup web-scraping google-colaboratory
我试图从这个网站刮表:https : //stockrow.com/VRTX/financials/income/quarterly
我正在使用 Python Google Colab,我希望将日期作为列。(例如 2020-06-30 等)我用代码来做这样的事情:
source = urllib.request.urlopen('https://stockrow.com/VRTX/financials/income/quarterly').read()
soup = bs.BeautifulSoup(source,'lxml')
table = soup.find_all('table')
Run Code Online (Sandbox Code Playgroud)
但是,我无法拿到桌子。我对抓取有点陌生,所以我查看了其他 Stackoverflow 页面,但无法解决问题。你能帮我么?那将不胜感激。
您可以使用他们的 API 来加载数据:
import requests
import pandas as pd
indicators_url = 'https://stockrow.com/api/indicators.json'
data_url = 'https://stockrow.com/api/companies/VRTX/financials.json?ticker=VRTX&dimension=Q§ion=Income+Statement'
indicators = {i['id']: i for i in requests.get(indicators_url).json()}
all_data = []
for d in requests.get(data_url).json():
d['id'] = indicators[d['id']]['name']
all_data.append(d)
df = pd.DataFrame(all_data)
df.to_csv('data.csv')
print(df)
Run Code Online (Sandbox Code Playgroud)
印刷:
id 2020-06-30 2020-03-31 2019-12-31 2019-09-30 2019-06-30 ... 2011-12-31 2011-09-30 2011-06-30 2011-03-31 2010-12-31 2010-09-30
0 Consolidated Net Income/Loss 837270000.0 602753000.0 583234100.0 57518000.0 267427000.0 ... 188141000.0 228452000.0 -199318000.0 -176096000.0 -180392000.0 -208957000.0
1 EPS (Basic, from Continuous Ops) 3.2248 2.3199 2.2654 0.2239 1.044 ... 0.9374 1.109 -0.9751 -0.8703 -0.8966 -1.0402
2 Net Profit Margin 0.5492 0.3978 0.4127 0.0606 0.2841 ... 0.2816 0.3354 -1.5213 -2.3906 -2.7531 -8.7816
3 Gross Profit 1339965000.0 1352610000.0 1228253000.0 817914000.0 805553000.0 ... 533213000.0 620794000.0 105118000.0 70996000.0 62475000.0 20567000.0
4 Income Tax Provision -12500000.0 54781000.0 93716000.0 13148000.0 59711000.0 ... 22660000.0 -27842000.0 24448000.0 0.0 NaN 0.0
5 Operating Income 718033000.0 720224100.0 551464400.0 99333000.0 269960000.0 ... 223901900.0 215707000.0 -165890000.0 -159899000.0 -166634000.0 -199588000.0
6 EBIT 718033000.0 720224100.0 551464700.0 99333000.0 269960000.0 ... 223901900.0 215707000.0 -165890000.0 -159899000.0 -166634000.0 -199588000.0
7 EPS (Diluted, from Cont. Ops) 3.1787 2.2874 2.2319 0.2208 1.0293 ... 1.0011 1.0415 -0.9751 -0.8703 -0.8966 -1.0402
8 EBITDA 744730000.0 747045000.0 577720400.0 125180000.0 297658000.0 ... 233625900.0 223457000.0 -157181000.0 -151041000.0 -158429000.0 -192830000.0
9 EPS (Basic, Consolidated) 3.2248 2.3199 2.2654 0.2239 1.044 ... 0.9374 1.109 -0.9751 -0.8703 -0.8966 -1.0402
10 EBT 824770000.0 657534000.0 676950000.0 70666000.0 327138000.0 ... 210801000.0 200610000.0 -174870000.0 -176096000.0 -180392000.0 -208957000.0
11 Operating Cash Flow Margin 0.6812 0.5384 0.3156 0.3525 0.4927 ... 0.8941 0.0651 -1.8894 -2.5336 -2.535 -6.8918
12 EBT margin 0.541 0.434 0.479 0.0744 0.3475 ... 0.3742 0.3043 -1.5283 -2.3906 -2.7531 -8.7816
13 EBIT Margin 0.471 0.4754 0.3902 0.1046 0.2868 ... 0.3975 0.3272 -1.4498 -2.1707 -2.5431 -8.3878
14 Income from Continuous Operations 837270000.0 602753000.0 583234000.0 57518000.0 267427000.0 ... 188141000.0 228452000.0 -199318000.0 -176096000.0 -180392000.0 -208957000.0
15 R&D Expenses 420928000.0 448528000.0 480011000.0 555948000.0 379091000.0 ... 186438000.0 189052000.0 173604000.0 158612000.0 168888000.0 170434000.0
16 Non-operating Interest Expenses 13871000.0 14136000.0 14249000.0 14548000.0 14837000.0 ... 11659000.0 7059000.0 6962000.0 12001000.0 7686000.0 3951000.0
17 EBITDA Margin 0.4885 0.4931 0.4088 0.1318 0.3162 ... 0.4147 0.339 -1.3737 -2.0505 -2.4179 -8.1038
18 Non-operating Income/Expense 106737000.0 -62690000.0 125485000.0 -28667000.0 57178000.0 ... -13101000.0 -15097000.0 -8980000.0 -16197000.0 -13758000.0 -9369000.0
19 EPS (Basic) 3.22 2.32 2.26 0.22 1.04 ... 0.76 1.06 -0.85 -0.87 -0.9 -1.04
20 Gross Margin 0.879 0.8927 0.8691 0.8611 0.8558 ... 0.9465 0.9417 0.9187 0.9638 0.9535 0.8643
21 Revenue 1524485000.0 1515107000.0 1413265000.0 949828000.0 941293000.0 ... 563340000.0 659200000.0 114424000.0 73662000.0 65524000.0 23795000.0
22 Shares (Diluted, Average) 263403000.0 263515000.0 262108000.0 260473000.0 259822000.0 ... 217602000.0 219349000.0 204413000.0 202329000.0 201355000.0 200887000.0
23 Cost of Revenue 184520000.0 162497000.0 185012000.0 131914000.0 135740000.0 ... 30127000.0 38406000.0 9306000.0 2666000.0 3049000.0 3228000.0
24 SG&A Expenses 191804000.0 182258000.0 195277000.0 159674000.0 156502000.0 ... 121881000.0 110654000.0 96663000.0 71523000.0 62478000.0 48855000.0
25 EPS (Diluted, Consolidated) 3.1787 2.2874 2.2319 0.2208 1.0293 ... 1.0011 1.0415 -0.9751 -0.8703 -0.8966 -1.0402
26 Revenue Growth 0.6196 0.765 0.6242 0.2107 0.2515 ... 7.5975 26.7033 2.6185 2.2842 0.9335 -0.0466
27 Shares (Basic, Weighted) 259637000.0 259815000.0 256728000.0 256946000.0 256154000.0 ... 204891000.0 206002000.0 204413000.0 202329000.0 200402000.0 200887000.0
28 Income after Tax 837270000.0 602753000.0 583234000.0 57518000.0 267427000.0 ... 188141000.0 228452000.0 -199318000.0 -176096000.0 -180392000.0 -208957000.0
29 EPS (Diluted) 3.18 2.29 2.23 0.22 1.03 ... 0.74 1.02 -0.85 -0.87 -0.9 -1.04
30 Net Income Common 837270000.0 602753000.0 583234100.0 57518000.0 267427000.0 ... 158629000.0 221110000.0 -174069000.0 -176096000.0 -180392000.0 -208957000.0
31 Shares (Diluted, Weighted) 263403000.0 263515000.0 260673000.0 260473000.0 259822000.0 ... 208807000.0 219349000.0 204413000.0 202329000.0 200402000.0 200887000.0
32 Non-Controlling Interest NaN NaN NaN NaN NaN ... 29512000.0 7342000.0 -25249000.0 0.0 NaN 0.0
33 Dividends (Preferred) NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
34 EPS (Basic, from Discontinued Ops) NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
35 EPS (Diluted, from Disc. Ops) NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
36 Income from Discontinued Operations NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN
[37 rows x 41 columns]
Run Code Online (Sandbox Code Playgroud)
并保存data.csv:
或者从该页面下载他们的 XLSX:
url = 'https://stockrow.com/api/companies/VRTX/financials.xlsx?dimension=Q§ion=Income%20Statement&sort=desc'
df = pd.read_excel(url)
pd.set_option('display.float_format', lambda x: '%.3f' % x)
print(df)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
187 次 |
| 最近记录: |