Dir*_*tel 62
在最简单的情况下,就这样做
X <- read.csv(url("http://some.where.net/data/foo.csv"))
Run Code Online (Sandbox Code Playgroud)
加上read.csv()
可能需要的选项.
答案很长:是的,可以这样做,很多包都使用了这个功能多年.例如,tseries软件包正好使用此功能从Yahoo!下载股票价格.近十年来:
R> library(tseries)
Loading required package: quadprog
Loading required package: zoo
‘tseries’ version: 0.10-24
‘tseries’ is a package for time series analysis and computational finance.
See ‘library(help="tseries")’ for details.
R> get.hist.quote("IBM")
trying URL 'http://chart.yahoo.com/table.csv? ## manual linebreak here
s=IBM&a=0&b=02&c=1991&d=5&e=08&f=2011&g=d&q=q&y=0&z=IBM&x=.csv'
Content type 'text/csv' length unknown
opened URL
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
.......... .......... .......... .......... ..........
........
downloaded 258 Kb
Open High Low Close
1991-01-02 112.87 113.75 112.12 112.12
1991-01-03 112.37 113.87 112.25 112.50
1991-01-04 112.75 113.00 111.87 112.12
1991-01-07 111.37 111.87 110.00 110.25
1991-01-08 110.37 110.37 108.75 109.00
1991-01-09 109.75 110.75 106.75 106.87
[...]
Run Code Online (Sandbox Code Playgroud)
这一切得非常好,在手册页记录help(connection)
和help(url)
.另请参阅R附带的"数据导入/导出"中的manul.
通常,网页上的数据采用XML表格的形式.您可以通过读取XML表格分为R 包XML.
在这个包中,功能
readHTMLTable(<url>)
Run Code Online (Sandbox Code Playgroud)
将查看XML表的页面并返回数据框列表(每个表找到一个).
read.csv(url("..."))
你旁边也可以使用read.table("http://...")
.
例:
> sample <- read.table("http://www.ats.ucla.edu/stat/examples/ara/angell.txt")
> sample
V1 V2 V3 V4 V5
1 Rochester 19.0 20.6 15.0 E
2 Syracuse 17.0 15.6 20.2 E
...
43 Atlanta 4.2 70.6 32.6 S
>
Run Code Online (Sandbox Code Playgroud)
read.csv
没有该url
功能就可以正常工作。如果Dirk Eddelbuettel将其包含在他的答案中,可能我会丢失一些东西:
ad <- read.csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)
Run Code Online (Sandbox Code Playgroud)
X TV radio newspaper sales
1 1 230.1 37.8 69.2 22.1
2 2 44.5 39.3 45.1 10.4
3 3 17.2 45.9 69.3 9.3
4 4 151.5 41.3 58.5 18.5
5 5 180.8 10.8 58.4 12.9
6 6 8.7 48.9 75.0 7.2
Run Code Online (Sandbox Code Playgroud)
使用两个流行软件包的另一个选择:
library(data.table)
ad <- fread("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)
Run Code Online (Sandbox Code Playgroud)
V1 TV radio newspaper sales
1: 1 230.1 37.8 69.2 22.1
2: 2 44.5 39.3 45.1 10.4
3: 3 17.2 45.9 69.3 9.3
4: 4 151.5 41.3 58.5 18.5
5: 5 180.8 10.8 58.4 12.9
6: 6 8.7 48.9 75.0 7.2
Run Code Online (Sandbox Code Playgroud)
library(readr)
ad <- read_csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv")
head(ad)
Run Code Online (Sandbox Code Playgroud)
# A tibble: 6 x 5
X1 TV radio newspaper sales
<int> <dbl> <dbl> <dbl> <dbl>
1 1 230.1 37.8 69.2 22.1
2 2 44.5 39.3 45.1 10.4
3 3 17.2 45.9 69.3 9.3
4 4 151.5 41.3 58.5 18.5
5 5 180.8 10.8 58.4 12.9
6 6 8.7 48.9 75.0 7.2
Run Code Online (Sandbox Code Playgroud)