我正在使用dlply()与自定义函数平均lm()的斜率适合包含一些NA值的数据,我得到错误"错误在lm.fit(x,y,偏移=偏移,singular.ok = singular.ok,...):0(非NA)案例"
这个错误只发生在我用两个关键变量调用dlply时 - 用一个变量分隔工作正常.
令人讨厌的是我无法使用简单的数据集重现错误,所以我在问题数据集中发布了问题数据集.
这是代码,尽可能最小化,同时仍然产生错误:
masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na.strings="#N/A")
workingData <- data.frame(sample = masterData$sample,
substrate = masterData$substrate,
el1 = masterData$elapsedHr1,
F1 = masterData$r1 - masterData$rK)
#This function is trivial as written; in reality it takes the average of many slopes
meanSlope <- function(df) {
lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
slope1 <- lm1$coefficients[2]
meanSlope <- mean(c(slope1))
}
lsGOOD <- dlply(workingData, .(sample), meanSlope) #works fine
lsBAD <- dlply(workingData, .(sample, substrate), meanSlope) #throws error
Run Code Online (Sandbox Code Playgroud)
提前感谢您的任何见解.
对于您的几个交叉分类,您缺少协变量:
with(masterData, table(sample, substrate, r1mis = is.na(r1) ) )
#
snipped the nonmissing reports
, , r1mis = TRUE
substrate
sample 1 2 3 4 5 6 7 8
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 3 3
8 0 0 0 0 0 0 0 3
9 0 0 0 0 0 0 0 3
10 0 0 0 0 0 0 0 3
11 0 0 0 0 0 0 0 3
12 0 0 0 0 0 0 0 3
13 0 0 0 0 0 0 0 3
14 0 0 0 0 0 0 0 3
Run Code Online (Sandbox Code Playgroud)
这将允许您跳过此特定数据中数据不足的子集:
meanSlope <- function(df) { if ( sum(!is.na(df$el1)) < 2 ) { return(NA) } else {
lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
slope1 <- lm1$coefficients[2]
meanSlope <- mean(c(slope1)) }
}
Run Code Online (Sandbox Code Playgroud)
虽然它取决于一个特定协变量的缺失.更强大的解决方案是用于try捕获错误并转换为NA.
?try
Run Code Online (Sandbox Code Playgroud)