小编CKr*_*Kre的帖子

网络刮痧与R

我在从网站上抓取数据时遇到了一些问题.首先,我没有很多关于webscraping的经验......我的计划是使用以下网站的R来获取一些数据:http://spiderbook.com/company/17495/details ?rel = 300795

特别是,我想提取本网站上文章的链接.

我的想法到目前为止:

xmltext <- htmlParse("http://spiderbook.com/company/17495/details?rel=300795")
sources <- xpathApply(xmltext,  "//body//div")
sourcesCharSep <- lapply(sourcesChar,function(x) unlist(strsplit(x, " "))) 
sourcesInd <- lapply(sourcesCharSep,function(x) grep('"(http://[^"]*)"',x))

Run Code Online (Sandbox Code Playgroud)

但这并没有提出预期的信息.这里有一些帮助真的很感激!谢谢!

最好的Christoph

r web-crawler web-scraping

CKr*_*Kre

2014 11-02

3
推荐指数

2
解决办法

3589
查看次数