从R中列出和描述CRAN中的所有包

ada*_*888 8 r

我可以通过该函数获得所有可用包的列表:

ap <- available.packages()
Run Code Online (Sandbox Code Playgroud)

但是我怎样才能从R中获得这些包的描述,所以我可以有data.frame两个列:包和描述?

Dir*_*tel 16

我实际上认为你想要"包"和"标题"作为"描述"可以运行到几行.所以这里是前者,如果你真的想要"描述" ,只需在最后一个子集中加上"描述":

R> ## from http://developer.r-project.org/CRAN/Scripts/depends.R and adapted
R>
R> require("tools")
R>
R> getPackagesWithTitle <- function() {
+     contrib.url(getOption("repos")["CRAN"], "source") 
+     description <- sprintf("%s/web/packages/packages.rds", 
+                            getOption("repos")["CRAN"])
+     con <- if(substring(description, 1L, 7L) == "file://") {
+         file(description, "rb")
+     } else {
+         url(description, "rb")
+     }
+     on.exit(close(con))
+     db <- readRDS(gzcon(con))
+     rownames(db) <- NULL
+
+     db[, c("Package", "Title")]
+ }
R>
R>
R> head(getPackagesWithTitle())               # I shortened one Title here...
     Package              Title
[1,] "abc"                "Tools for Approximate Bayesian Computation (ABC)"
[2,] "abcdeFBA"           "ABCDE_FBA: A-Biologist-Can-Do-Everything of Flux ..."
[3,] "abd"                "The Analysis of Biological Data"
[4,] "abind"              "Combine multi-dimensional arrays"
[5,] "abn"                "Data Modelling with Additive Bayesian Networks"
[6,] "AcceptanceSampling" "Creation and evaluation of Acceptance Sampling Plans"
R>
Run Code Online (Sandbox Code Playgroud)


Tyl*_*ker 7

Dirk提供了一个非常好的答案,在完成我的解决方案之后,然后看到他辩论了一段时间,因为害怕看起来很傻.但我决定发布它有两个原因:

  1. 它像我一样开始报道
  2. 我花了一段时间才做,所以为什么不:)

我接近这个想法,我需要做一些网络报废,并选择crantastic作为刮去的网站.首先,我将提供代码,然后提供两个对我来说非常有帮助的资源:

library(RCurl)
library(XML)

URL <- "http://cran.r-project.org/web/checks/check_summary.html#summary_by_package"
packs <- na.omit(XML::readHTMLTable(doc = URL, which = 2, header = T, 
    strip.white = T, as.is = FALSE, sep = ",", na.strings = c("999", 
        "NA", " "))[, 1])
Trim <- function(x) {
    gsub("^\\s+|\\s+$", "", x)
}
packs <- unique(Trim(packs))
u1 <- "http://crantastic.org/packages/"
len.samps <- 10 #for demo purpose; use:
#len.samps <- length(packs) # for all of them
URL2 <- paste0(u1, packs[seq_len(len.samps)]) 
scraper <- function(urls){ #function to grab description
    doc   <- htmlTreeParse(urls, useInternalNodes=TRUE)
    nodes <- getNodeSet(doc, "//p")[[3]]
    return(nodes)
}
info <- sapply(seq_along(URL2), function(i) try(scraper(URL2[i]), TRUE))
info2 <- sapply(info, function(x) { #replace errors with NA
        if(class(x)[1] != "XMLInternalElementNode"){
            NA
        } else {
            Trim(gsub("\\s+", " ", xmlValue(x)))
        }
    }
)
pack_n_desc <- data.frame(package=packs[seq_len(len.samps)], 
    description=info2) #make a dataframe of it all
Run Code Online (Sandbox Code Playgroud)

资源:

  1. talkstats.com网络抓取线程(很棒的初学者例子)
  2. w3schools.com网站上的html东西(非常有帮助)