R:在函数参数中为一般(通用)使用的函数指定变量名

jon*_*jon 4 variables functional-programming r dataframe

这是我的小功能和数据.请注意,我想设计一个非一般用途的功能.

dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)

myfun <- function (dataframe, varA, varB) {
              daf2 <- data.frame (A = dataframe$A*dataframe$B, 
              B= dataframe$C*dataframe$D)
              anv1 <- lm(varA ~ varB, daf2)
              print(anova(anv1)) 
             }             

myfun (dataframe = dataf, varA = A, varB = B)

Error in eval(expr, envir, enclos) : object 'A' not found
Run Code Online (Sandbox Code Playgroud)

它适用于我指定数据$ variable name,但我不想制作这样的规范,因此它要求用户在函数中写入数据和变量名.

 myfun (dataframe = dataf, varA = dataf$A, varB = dataf$B)
Analysis of Variance Table

Response: varA
          Df Sum Sq Mean Sq    F value    Pr(>F)    
varB       1   82.5    82.5 1.3568e+33 < 2.2e-16 ***
Residuals  8    0.0     0.0                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Warning message:
In anova.lm(anv1) :
  ANOVA F-tests on an essentially perfect fit are unreliable
Run Code Online (Sandbox Code Playgroud)

这种情况下的最佳做法是什么?我可以将数据框附加到函数内吗?这样做有什么不利或潜在的冲突/危险?请参阅输出中的masked语句.我相信一旦附上会继续提醒会话权吗?这里提供的功能只是示例,我需要更多的下游分析,其中来自不同数据帧的变量名称可以是/应该是相同的.我期待一个程序员解决方案.

myfun <- function (dataframe, varA, varB) {
              attach(dataframe)
                 daf2 <- data.frame (A = A*B, B= C*D)
              anv1 <- lm(varA ~ varB, daf2)
              return(anova(anv1))
             }             

myfun (dataframe = dataf, varA = A, varB = B)

The following object(s) are masked from 'dataframe (position 3)':

    A, B, C, D
Analysis of Variance Table

Response: varA
          Df Sum Sq Mean Sq    F value    Pr(>F)    
varB       1   82.5    82.5 1.3568e+33 < 2.2e-16 ***
Residuals  8    0.0     0.0                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Warning message:
In anova.lm(anv1) :
  ANOVA F-tests on an essentially perfect fit are unreliable
Run Code Online (Sandbox Code Playgroud)

Nic*_*bbe 7

让我们调查(参见我添加的评论)您原始函数和调用,假设您的意思是将您感兴趣的列的名称传递给函数:

myfun <- function (dataframe, varA, varB) {
              #on this next line, you use A and B. But this should be what is
              #passed in as varA and varB, no?
              daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
              #so, as a correction, we need:
              colnames(daf2)<-c(varA, varB)
              #the first argument to lm is a formula. If you use it like this,
              #it refers to columns with _names_ varA and varB, not as names
              #the _contents_ of varA and varB!!
              anv1 <- lm(varA ~ varB, daf2)
              #so, what we really want, is to build a formula with the contents
              #of varA and varB: we have to this by building up a character string:
              frm<-paste(varA, varB, sep="~")
              anv1 <- lm(formula(frm), daf2)
              print(anova(anv1)) 
             }             
#here, you pass A and B, because you are used to being able to do that in a formula
#(like in lm). But in a formula, there is a great deal of work done to make that
#happen, that doesn't work for most of the rest of R, so you need to pass the names
#again as character strings:
myfun (dataframe = dataf, varA = A, varB = B)
#becomes:
myfun (dataframe = dataf, varA = "A", varB = "B")
Run Code Online (Sandbox Code Playgroud)

注意:在上面,我保留了原始代码,因此您可能必须删除其中一些以避免您最初获得的错误.您的问题的本质是您应该始终将列名称作为字符传递,并使用它们.这是R中公式的语法糖使人们陷入不良习惯和误解的地方之一......

现在,作为替代方案:实际使用变量名称的唯一位置在公式中.因此,如果您不介意稍后可以清理的结果中存在一些轻微的外观差异,您可以进一步简化问题:您无需传递列名!

myfun <- function (dataframe) {
              daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
              #now we know that columns A and B simply exist in data.frame daf2!!
              anv1 <- lm(A ~ B, daf2)
              print(anova(anv1))
             }             
Run Code Online (Sandbox Code Playgroud)

作为建议的最后一块:我会避免从调用您的最后一条语句打印:如果你不这样做,而是直接从R命令行使用此方法,它会为你反正进行打印.作为一个额外的优点,您可以使用从方法返回的对象执行进一步的工作.

试用清理功能:

dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)
myfun <- function (dataframe, varA, varB) {
               frm<-paste(varA, varB, sep="~")
               anv1 <- lm(formula(frm), dataframe)
               anova(anv1)
             }
 myfun (dataframe = dataf, varA = "A", varB = "B")
  myfun (dataframe = dataf, varA = "A", varB = "D")
    myfun (dataframe = dataf, varA = "B", varB = "C")
Run Code Online (Sandbox Code Playgroud)