难道ifelse真的同时计算yes和no载体-如,每个向量的全部?或者它只是从每个向量计算一些值?
还有,ifelse真的那么慢吗?
通常,当我运行基准测试时,我将我的语句包装起来expression.最近,有人建议(a)不这样做或(b)使用quote而不是表达.
我发现包装语句有两个好处:
然而,在探索不同的方法时,我注意到三种方法之间存在差异(包装expression,包装quote或不包装)
问题是:
为什么要有差异?
(似乎包装quote并不会实际评估呼叫.)
# SAMPLE DATA
mat <- matrix(sample(seq(1e6), 4^2*1e4, T), ncol=400)
# RAW EXPRESSION TO BENCHMARK IS:
# apply(mat, 2, mean)
# WRAPPED EXPRESSION:
expr <- expression(apply(mat, 2, mean))
quot <- quote(apply(mat, 2, mean))
# BENCHMARKS
benchmark(raw=apply(mat, 2, mean), expr, quot)[, -(7:8)]
# test replications elapsed relative user.self sys.self
# 2 expr 100 1.269 NA 1.256 0.019
# 3 quot 100 0.000 NA …Run Code Online (Sandbox Code Playgroud) 我有一个带有分类变量的数据框,其中包含可变长度的字符串列表(这很重要,因为否则此问题将与此或此重复),例如:
df <- data.frame(x = 1:5)
df$y <- list("A", c("A", "B"), "C", c("B", "D", "C"), "E")
df
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)x y 1 1 A 2 2 A, B 3 3 C 4 4 B, D, C 5 5 E
并且所需的形式是在任何地方看到的每个唯一字符串的虚拟变量df$y,即:
data.frame(x = 1:5, A = c(1,1,0,0,0), B = c(0,1,0,1,0), C = c(0,0,1,1,0), D = c(0,0,0,1,0), E = c(0,0,0,0,1))
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)x A B C D E 1 1 1 0 0 0 0 2 2 1 …
我有一个庞大的数据集,其中有一列包含每个主题(行)的几个值.这是一个简化的示例数据帧:
data <- data.frame(subject = c(1:8), sex = c(1, 2, 2, 1, 2, 1, 1, 2),
age = c(35, 29, 31, 46, 64, 57, 49, 58),
v1 = c("2", "0", "3,5", "2 1", "A,4", "B,1,C", "A and B,3", "5, 6 A or C"))
> data
subject sex age v1
1 1 1 35 2
2 2 2 29 0
3 3 2 31 3,5 # separated by a comma
4 4 1 46 2 1 # separated by a blank …Run Code Online (Sandbox Code Playgroud) 这是我的虚拟数据集:
dataset<-data.frame(a=c(1,2,3,4),b=c('a','b','c','d'), c=c("HI","DD","gg","ff"))
g=list(c("a","b"),c(2,3,4), c(44,33,11,22),c("chr","ID","i","II"))
dataset$l<-g
dataset
a b c l
1 1 a HI a, b
2 2 b DD 2, 3, 4
3 3 c gg 44, 33, 11, 22
4 4 d ff chr, ID, i, II
> mode(dataset$l)
[1] "list"
Run Code Online (Sandbox Code Playgroud)
当我尝试将数据集写入文件时:
> write.table(dataset, "dataset.txt", quote=F, sep="\t")
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote), :
unimplemented type 'list' in 'EncodeElement'
Run Code Online (Sandbox Code Playgroud)
我怎么解决这个问题?
大多数专业用户建议我永远不要在R中使用循环.请改用apply函数.问题是,如果您不熟悉函数式编程,那么为每个for/while循环编写一个应用等效项并不是那么直观.以下面的例子为例.
F <- data.frame(name = c("a", "b", "c", "d"), var1 = c(1,0,0,1), var2 = c(0,0,1,1),
var3 = c(1,1,1,1), clus = c("one", "two", "three", "four"))
F$ObjTrim <- ""
for (i in 1:nrow(F))
{
for (j in 2:(ncol(F)-1))
{
if(F[i, j] == 1)
{F$ObjTrim[i] <- paste(F$ObjTrim[i], colnames(F)[j], sep = " ") }
}
print(i)
}
Run Code Online (Sandbox Code Playgroud)
这里的目标是创建一个变量"ObjTrim",它接受所有具有值== 1的列名的值.有人可以建议一个等同于此的良好应用吗?
例如,上面的代码将给出:
name var1 var2 var3 clus ObjTrim
1 a 1 0 1 one var1 var3
2 b 0 0 1 two var3
3 c 0 …Run Code Online (Sandbox Code Playgroud) 我有一组包含空格分隔元素的字符串.我想建立一个矩阵,告诉我哪些元素是哪些字符串的一部分.例如:
""
"A B C"
"D"
"B D"
Run Code Online (Sandbox Code Playgroud)
应该给出类似的东西:
A B C D
1
2 1 1 1
3 1
4 1 1
Run Code Online (Sandbox Code Playgroud)
现在我已经有了一个解决方案,但是它作为磨拉石运行缓慢,而且我已经没有关于如何加快速度的想法:
reverseIn <- function(vector, value) {
return(value %in% vector)
}
buildCategoryMatrix <- function(valueVector) {
allClasses <- c()
for(classVec in unique(valueVector)) {
allClasses <- unique(c(allClasses,
strsplit(classVec, " ", fixed=TRUE)[[1]]))
}
resMatrix <- matrix(ncol=0, nrow=length(valueVector))
splitValues <- strsplit(valueVector, " ", fixed=TRUE)
for(cat in allClasses) {
if(cat=="") {
catIsPart <- (valueVector == "")
} else {
catIsPart <- sapply(splitValues, reverseIn, …Run Code Online (Sandbox Code Playgroud) 我需要一种快速而简洁的方法将数据帧中的字符串文字拆分为一组列.假设我有这个数据框
data <- data.frame(id=c(1,2,3), tok1=c("a, b, c", "a, a, d", "b, d, e"), tok2=c("alpha|bravo", "alpha|charlie", "tango|tango|delta") )
Run Code Online (Sandbox Code Playgroud)
(请注意列之间的不同分隔符)
字符串列的数量通常是事先不知道的(尽管我可以尝试发现整个案例集,如果我没有其他选择)
我需要两个数据框,如:
tok1.occurrences:
+----+---+---+---+---+---+
| id | a | b | c | d | e |
+----+---+---+---+---+---+
| 1 | 1 | 1 | 1 | 0 | 0 |
| 2 | 2 | 0 | 0 | 1 | 0 |
| 3 | 0 | 1 | 0 | 1 | 1 |
+----+---+---+---+---+---+
tok2.occurrences:
+----+-------+-------+---------+-------+-------+
| id …Run Code Online (Sandbox Code Playgroud) r ×8
apply ×1
benchmarking ×1
expression ×1
if-statement ×1
list ×1
optimization ×1
performance ×1
substring ×1
tidyverse ×1
tm ×1
tokenize ×1