install.packages("data.table")
trying URL 'https://cran.rstudio.com/bin/macosx/el-capitan/contrib/3.5/data.table_1.12.8.tgz'
Content type 'application/x-gzip' length 2117137 bytes (2.0 MB)
downloaded 2.0 MB
The downloaded binary packages are in
/var/folders/r1/1rsn2y0j78v907qgv0btm_fm0000gn/T//RtmppBu3UK/downloaded_packages
Run Code Online (Sandbox Code Playgroud)
使用提供的代码更新到最新的开发版本:
data.table::update.dev.pkg()
Run Code Online (Sandbox Code Playgroud)
打印的控制台:
Warning: unable to access index for repository https://Rdatatable.gitlab.io/data.table/bin/macosx/el-capitan/contrib/3.5:
cannot open URL 'https://Rdatatable.gitlab.io/data.table/bin/macosx/el-capitan/contrib/3.5/PACKAGES'
Package which is only available in source form, and may need compilation of C/C++/Fortran: ‘data.table’
Do you want to attempt to install these from sources? (Yes/no/cancel) Yes
installing the source package ‘data.table’
trying URL 'https://Rdatatable.gitlab.io/data.table/src/contrib/data.table_1.12.9.tar.gz'
Content type 'application/gzip' length 5189945 bytes (4.9 MB) …Run Code Online (Sandbox Code Playgroud) OECD 数据中的 STRATUM 太长,为了简单起见,我使用了这个名称,并希望将其简化为更短和更精确的命名,如下面的代码所示。
pisaMas[,`:=`
(SchoolType = c(ifelse(STRATUM == "National Secondary School", "Public",
ifelse(STRATUM == "Religious School", "Religious",
ifelse(STRATUM == "MOE Technical School", "Technical",0)))))]
pisaMas[,table(SchoolType)]
Run Code Online (Sandbox Code Playgroud)
我想知道是否有一个简单的方法来解决这个问题,使用 data.table 包。
我使用 R 中的 rvest 使用以下代码从本文页面中抓取文本关键字:
#install.packages("xml2") # required for rvest
library("rvest") # for web scraping
library("dplyr") # for data management
#' start with get the link for the web to be scraped
page <- read_html("https://www.sciencedirect.com/science/article/pii/S1877042810004568")
keyW <- page %>% html_nodes("div.Keywords.u-font-serif") %>% html_text() %>% paste(collapse = ",")
Run Code Online (Sandbox Code Playgroud)
它给了我:
> keyW
[1] "KeywordsPhysics curriculumTurkish education systemfinnish education systemPISAphysics achievement"
Run Code Online (Sandbox Code Playgroud)
使用以下代码行从字符串中删除单词“Keywords”及其之前的所有内容后:
keyW <- gsub(".*Keywords","", keyW)
Run Code Online (Sandbox Code Playgroud)
新的密钥W是:
[1] "Physics curriculumTurkish education systemfinnish education systemPISAphysics achievement"
Run Code Online (Sandbox Code Playgroud)
但是,我想要的输出是这个列表:
[1] "Physics curriculum" "Turkish education system" "finnish education …Run Code Online (Sandbox Code Playgroud) r ×3
data.table ×2
gsub ×2
c ×1
if-statement ×1
installation ×1
llvm-clang ×1
rename ×1
rvest ×1
strsplit ×1
text-mining ×1