我试图将'9¼''转换为'9.25',但似乎无法正确读取分数.
这是我正在使用的数据:
library(XML)
url <- paste("http://mockdraftable.com/players/2014/", sep = "")
combine <- readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F)
names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
"Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad",
"Cone3", "ShortShuttle20")
Run Code Online (Sandbox Code Playgroud)
例如,第一行中的Hands列是'9¼'',我将如何组合$ Hands变为9.25?对于所有其他分数1/8 - 7/8也是如此.
任何帮助,将不胜感激.
在使用特殊的返回函数读取XML时,您可以尝试将unicode编码直接转换为ASCII:
library(stringi)
readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
val = xmlValue(node); stri_trans_general(val,"latin-ascii")})
Run Code Online (Sandbox Code Playgroud)
然后,您可以使用@Metrics的建议将其转换为数字.
你可以做,例如,使用@G.格罗腾迪克的功能来自这篇文章清理Arms数据:
library(XML)
library(stringi)
library(gsubfn)
#the calc function is by @G. Grothendieck
calc <- function(s) {
x <- c(if (length(s) == 2) 0, as.numeric(s), 0:1)
x[1] + x[2] / x[3]
}
url <- paste("http://mockdraftable.com/players/2014/", sep = "")
combine<-readHTMLTable(url,which=1, header=FALSE, stringsAsFactors=F,elFun=function(node) {
val = xmlValue(node); stri_trans_general(val,"latin-ascii")})
names(combine) <- c("Name", "Pos", "Hght", "Wght", "Arms", "Hands",
"Dash40yd", "Dash20yd", "Dash10yd", "Bench", "Vert", "Broad",
"Cone3", "ShortShuttle20")
sapply(strapplyc(gsub('\"',"",combine$Arms), "\\d+"), calc)
#[1] 30.000 31.500 30.000 31.750 31.875 29.875 31.000 31.000 30.250 33.000 32.500 31.625 32.875
Run Code Online (Sandbox Code Playgroud)
可能存在一些编码问题,具体取决于您的计算机(请参阅注释)