df <- data.frame(x = c(1,1,1,2,2,3,3,3,4,5,5),
y = c("A","B","C","A","B","A","B","D","B","C","D"),
z = c(3,2,1,4,2,3,2,1,2,3,4))
df_new <- dcast(df, x ~ y, value.var = "z")
Run Code Online (Sandbox Code Playgroud)
如果上面给出的样本数据,则 dcast() 函数保留 NA 值。但它不适用于我的数据集。因此,该函数将 na 转换为零。为什么?
如何保持 na 值?
r <- read.csv("ratings.csv")
m <- read.csv("movies.csv")
rm <- merge(ratings, movies, by="movieId")
umr <- dcast(rm, userId ~ title, value.var = "rating", fun.aggregate= sum)
Run Code Online (Sandbox Code Playgroud)
提前致谢。
在第一个示例中,fun.aggregate未调用,但在第二个示例中,更改是fun.aggregate被调用。根据?dcast
library(reshape2)
Run Code Online (Sandbox Code Playgroud)
fill - 用于填充结构缺失的值,默认为将 fun.aggregate 应用到 0 长度向量的值
dcast(df, x ~ y, value.var = "z", fun.aggregate = NULL)
# x A B C D
#1 1 3 2 1 NA
#2 2 4 2 NA NA
#3 3 3 2 NA 1
#4 4 NA 2 NA NA
#5 5 NA NA 3 4
dcast(df, x ~ y, value.var = "z", fun.aggregate = sum)
# x A B C D
#1 1 3 2 1 0
#2 2 4 2 0 0
#3 3 3 2 0 1
#4 4 0 2 0 0
#5 5 0 0 3 4
Run Code Online (Sandbox Code Playgroud)
请注意,这里每个组合只有一个元素,因此sum将返回相同的值,除非存在不存在的特定组合,则返回 0。它基于sum
length(integer(0))
#[1] 0
sum(integer(0))
#[1] 0
sum(NULL)
#[1] 0
Run Code Online (Sandbox Code Playgroud)
或者当所有元素都存在时NA,如果我们使用na.rm,则不会有任何要求和的元素,那么它也会进入integer(0)模式
sum(c(NA, NA), na.rm = TRUE)
#[1] 0
Run Code Online (Sandbox Code Playgroud)
如果我们使用sum_from hablar,则此行为将更改为 returnNA
library(hablar)
sum_(c(NA, NA))
#[1] NA
Run Code Online (Sandbox Code Playgroud)
一种选择是在fun.aggregate返回中创建条件NA
dcast(df, x ~ y, value.var = "z",
fun.aggregate = function(x) if(length(x) == 0) NA_real_ else sum(x, na.rm = TRUE))
# x A B C D
#1 1 3 2 1 NA
#2 2 4 2 NA NA
#3 3 3 2 NA 1
#4 4 NA 2 NA NA
#5 5 NA NA 3 4
Run Code Online (Sandbox Code Playgroud)
有关如何sum创建(原始函数)的更多信息,请在此处查看源代码