jon*_*jon 4 variables functional-programming r dataframe
这是我的小功能和数据.请注意,我想设计一个非一般用途的功能.
dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)
myfun <- function (dataframe, varA, varB) {
daf2 <- data.frame (A = dataframe$A*dataframe$B,
B= dataframe$C*dataframe$D)
anv1 <- lm(varA ~ varB, daf2)
print(anova(anv1))
}
myfun (dataframe = dataf, varA = A, varB = B)
Error in eval(expr, envir, enclos) : object 'A' not found
Run Code Online (Sandbox Code Playgroud)
它适用于我指定数据$ variable name,但我不想制作这样的规范,因此它要求用户在函数中写入数据和变量名.
myfun (dataframe = dataf, varA = dataf$A, varB = dataf$B)
Analysis of Variance Table
Response: varA
Df Sum Sq Mean Sq F value Pr(>F)
varB 1 82.5 82.5 1.3568e+33 < 2.2e-16 ***
Residuals 8 0.0 0.0
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In anova.lm(anv1) :
ANOVA F-tests on an essentially perfect fit are unreliable
Run Code Online (Sandbox Code Playgroud)
这种情况下的最佳做法是什么?我可以将数据框附加到函数内吗?这样做有什么不利或潜在的冲突/危险?请参阅输出中的masked语句.我相信一旦附上会继续提醒会话权吗?这里提供的功能只是示例,我需要更多的下游分析,其中来自不同数据帧的变量名称可以是/应该是相同的.我期待一个程序员解决方案.
myfun <- function (dataframe, varA, varB) {
attach(dataframe)
daf2 <- data.frame (A = A*B, B= C*D)
anv1 <- lm(varA ~ varB, daf2)
return(anova(anv1))
}
myfun (dataframe = dataf, varA = A, varB = B)
The following object(s) are masked from 'dataframe (position 3)':
A, B, C, D
Analysis of Variance Table
Response: varA
Df Sum Sq Mean Sq F value Pr(>F)
varB 1 82.5 82.5 1.3568e+33 < 2.2e-16 ***
Residuals 8 0.0 0.0
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Warning message:
In anova.lm(anv1) :
ANOVA F-tests on an essentially perfect fit are unreliable
Run Code Online (Sandbox Code Playgroud)
让我们调查(参见我添加的评论)您原始函数和调用,假设您的意思是将您感兴趣的列的名称传递给函数:
myfun <- function (dataframe, varA, varB) {
#on this next line, you use A and B. But this should be what is
#passed in as varA and varB, no?
daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
#so, as a correction, we need:
colnames(daf2)<-c(varA, varB)
#the first argument to lm is a formula. If you use it like this,
#it refers to columns with _names_ varA and varB, not as names
#the _contents_ of varA and varB!!
anv1 <- lm(varA ~ varB, daf2)
#so, what we really want, is to build a formula with the contents
#of varA and varB: we have to this by building up a character string:
frm<-paste(varA, varB, sep="~")
anv1 <- lm(formula(frm), daf2)
print(anova(anv1))
}
#here, you pass A and B, because you are used to being able to do that in a formula
#(like in lm). But in a formula, there is a great deal of work done to make that
#happen, that doesn't work for most of the rest of R, so you need to pass the names
#again as character strings:
myfun (dataframe = dataf, varA = A, varB = B)
#becomes:
myfun (dataframe = dataf, varA = "A", varB = "B")
Run Code Online (Sandbox Code Playgroud)
注意:在上面,我保留了原始代码,因此您可能必须删除其中一些以避免您最初获得的错误.您的问题的本质是您应该始终将列名称作为字符传递,并使用它们.这是R中公式的语法糖使人们陷入不良习惯和误解的地方之一......
现在,作为替代方案:实际使用变量名称的唯一位置在公式中.因此,如果您不介意稍后可以清理的结果中存在一些轻微的外观差异,您可以进一步简化问题:您无需传递列名!
myfun <- function (dataframe) {
daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
#now we know that columns A and B simply exist in data.frame daf2!!
anv1 <- lm(A ~ B, daf2)
print(anova(anv1))
}
Run Code Online (Sandbox Code Playgroud)
作为建议的最后一块:我会避免从调用您的最后一条语句打印:如果你不这样做,而是直接从R命令行使用此方法,它会为你反正进行打印.作为一个额外的优点,您可以使用从方法返回的对象执行进一步的工作.
试用清理功能:
dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)
myfun <- function (dataframe, varA, varB) {
frm<-paste(varA, varB, sep="~")
anv1 <- lm(formula(frm), dataframe)
anova(anv1)
}
myfun (dataframe = dataf, varA = "A", varB = "B")
myfun (dataframe = dataf, varA = "A", varB = "D")
myfun (dataframe = dataf, varA = "B", varB = "C")
Run Code Online (Sandbox Code Playgroud)