Mer*_*lin 0 python yahoo lxml beautifulsoup web-scraping
我已经看过一些网络广播,需要帮助才能做到这一点:我一直在使用lxml.html.雅虎最近改变了网络结构.
目标页面;
http://finance.yahoo.com/quote/IBM/options?date=1469750400&straddle=true
在使用检查器的Chrome中:我看到了数据
//*[@id="main-0-Quote-Proxy"]/section/section/div[2]/section/section/table
Run Code Online (Sandbox Code Playgroud)
那么一些代码
如何将这些数据输出到列表中.我想换成其他股票从"LLY"到"Msft"?
如何在日期之间切换....并获得所有月份.
我知道你说你不能用lxml.html
.但这里是如何使用该库,因为它是非常好的库.所以我提供使用它的代码,为了完整性,因为我不再使用BeautifulSoup
- 它没有维护,速度慢且API难度大.
下面的代码解析页面并将结果写入csv文件.
import lxml.html
import csv
doc = lxml.html.parse('http://finance.yahoo.com/q/os?s=lly&m=2011-04-15')
# find the first table contaning any tr with a td with class yfnc_tabledata1
table = doc.xpath("//table[tr/td[@class='yfnc_tabledata1']]")[0]
with open('results.csv', 'wb') as f:
cf = csv.writer(f)
# find all trs inside that table:
for tr in table.xpath('./tr'):
# add the text of all tds inside each tr to a list
row = [td.text_content().strip() for td in tr.xpath('./td')]
# write the list to the csv file:
cf.writerow(row)
Run Code Online (Sandbox Code Playgroud)
而已!lxml.html
真是太简单了!太糟糕了,你不能使用它.
以下results.csv
是生成的文件中的一些行:
LLY110416C00017500,N/A,0.00,17.05,18.45,0,0,17.50,LLY110416P00017500,0.01,0.00,N/A,0.03,0,182
LLY110416C00020000,15.70,0.00,14.55,15.85,0,0,20.00,LLY110416P00020000,0.06,0.00,N/A,0.03,0,439
LLY110416C00022500,N/A,0.00,12.15,12.80,0,0,22.50,LLY110416P00022500,0.01,0.00,N/A,0.03,2,50
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3118 次 |
最近记录: |