我想使用%>% - chaining在tidyverse中执行此操作.
df <-
structure(list(id = c(2L, 2L, 4L, 5L, 5L, 5L, 5L), start_end = structure(c(2L,
1L, 2L, 2L, 1L, 2L, 1L), .Label = c("end", "start"), class = "factor"),
date = structure(c(6L, 7L, 3L, 8L, 9L, 10L, 11L), .Label = c("1979-01-03",
"1979-06-21", "1979-07-18", "1989-09-12", "1991-01-04", "1994-05-01",
"1996-11-04", "2005-02-01", "2009-09-17", "2010-10-01", "2012-10-06"
), class = "factor")), .Names = c("id", "start_end", "date"
), row.names = c(3L, 4L, 7L, 8L, 9L, 10L, 11L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
data.table::dcast( df, formula …
Run Code Online (Sandbox Code Playgroud) 我想转换这样的表 (*):
set.seed(1)
mydata <- data.frame(ID=rep(1:4, each=3), R=rep(1:3, times=4), FIXED=rep(runif(4), each=3), AAA=rnorm(12), BBB=rbinom(12,12,0.5), CCC=runif(12))
ID R FIXED AAA BBB CCC
1 1 0.26 -0.83 8 0.82
1 2 0.26 1.59 5 0.64
1 3 0.26 0.32 6 0.78
2 1 0.37 -0.82 6 0.55
2 2 0.37 0.48 6 0.52
2 3 0.37 0.73 4 0.78
3 1 0.57 0.57 8 0.02
3 2 0.57 -0.30 7 0.47
3 3 0.57 1.51 7 0.73
4 1 0.90 0.38 4 0.69 …
Run Code Online (Sandbox Code Playgroud) 我想将 data.tabledcast
函数放入一个函数中,该函数可以处理聚合函数的自定义数量/顺序。这就是为什么我需要将聚合函数作为参数传递给dcast
函数。参数需要在外部定义dcast
。我怎么能这样做呢?
这很好用,但我想在 dcast 之外定义聚合函数。
dt = data.table(x = sample(5, 20, TRUE), y = sample(2, 20, TRUE),
z = sample(letters[1:2], 20, TRUE), d1 = runif(20), d2 = 1L
dcast(dt, x + y ~ z, fun = list(sum, min), value.var = "d1")
Run Code Online (Sandbox Code Playgroud)
我尝试了这个方法:
func <- list(sum, min)
dcast(dt, x + y ~ z, fun = func, value.var = "d1")
Run Code Online (Sandbox Code Playgroud)
然后我收到此错误消息:
eval(expr, envir, enclos) 中的错误:找不到函数“func”
我有一个数据框df
,看起来像这样...
"ID","ReleaseYear","CriticPlayerPrefer","n","CountCriticScores","CountUserScores"
"1",1994,"Both",1,5,283
"2",1994,"Critics",0,0,0
"3",1994,"Players",0,0,0
"4",1995,"Both",3,17,506
"5",1995,"Critics",0,0,0
"6",1995,"Players",0,0,0
"7",1996,"Both",18,163,3536
"8",1996,"Critics",2,18,97
"9",1996,"Players",3,20,79
Run Code Online (Sandbox Code Playgroud)
我想翻转数据框,使列如下所示:
"ReleaseYear","Both","Critics","Players"
每个栏位Both',
Critics and
Players would be the
n` 的值。
当我尝试运行此...
require(dcast)
chartData.CriticPlayerPreferByYear <- dcast(
data = df,
formula = ReleaseYear ~ CriticPlayerPrefer,
fill = 0,
value.var = n
)
Run Code Online (Sandbox Code Playgroud)
...我得到这个错误:
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
Run Code Online (Sandbox Code Playgroud)
这里有什么问题?我如何解决它?
我有一个df,其数据如下:
sub = c("X001","X002", "X001","X003","X002","X001","X001","X003","X002","X003","X003","X002")
month = c("201506", "201507", "201506","201507","201507","201508", "201508","201507","201508","201508", "201508", "201508")
tech = c("mobile", "tablet", "PC","mobile","mobile","tablet", "PC","tablet","PC","PC", "mobile", "tablet")
brand = c("apple", "samsung", "dell","apple","samsung","apple", "samsung","dell","samsung","dell", "dell", "dell")
revenue = c(20, 15, 10,25,20,20, 17,9,14,12, 9, 11)
df = data.frame(sub, month, brand, tech, revenue)
Run Code Online (Sandbox Code Playgroud)
我想使用sub和month作为密钥,每个订户每月获得一行,显示该月份该订户的技术和品牌的唯一值的收入总和.这个例子很简单,列数较少,因为我有一个巨大的数据集,我决定尝试这样做data.table
.
我已经设法为一个catagorical列做了这个,使用这个:技术或品牌:
df1 <- dcast(df, sub + month ~ tech, fun=sum, value.var = "revenue")
Run Code Online (Sandbox Code Playgroud)
但我想为两个或更多的caqtogorical列做这个,到目前为止我已经尝试过这个:
df2 <- dcast(df, sub + month ~ tech+brand, fun=sum, value.var = "revenue")
Run Code Online (Sandbox Code Playgroud)
它只是连接了catogorical列的唯一值和总和,但我不希望这样.我想为所有catogorical列的每个独特值分隔列.
我是R的新手,非常感谢任何帮助.
试图解决这个问题.假设你有一个data.table:
dt <- data.table (person=c('bob', 'bob', 'bob'),
door=c('front door', 'front door', 'front door'),
type=c('timeIn', 'timeIn', 'timeOut'),
time=c(
as.POSIXct('2016 12 02 06 05 01', format = '%Y %m %d %H %M %S'),
as.POSIXct('2016 12 02 06 05 02', format = '%Y %m %d %H %M %S'),
as.POSIXct('2016 12 02 06 05 03', format = '%Y %m %d %H %M %S') )
)
Run Code Online (Sandbox Code Playgroud)
我想将它转动为这样
person door timeIn timeOut
bob front door min(<date/time>) max(<date/time>)
Run Code Online (Sandbox Code Playgroud)
我似乎无法为dcast.data.table获得正确的语法.我试过了
dcast.data.table(
dt, person + door ~ type,
value.var …
Run Code Online (Sandbox Code Playgroud) 我正在尝试为具有多个组的数据框的多列找到不包括 NA 的方法
airquality <- data.frame(City = c("CityA", "CityA","CityA",
"CityB","CityB","CityB",
"CityC", "CityC"),
year = c("1990", "2000", "2010", "1990",
"2000", "2010", "2000", "2010"),
month = c("June", "July", "August",
"June", "July", "August",
"June", "August"),
PM10 = c(runif(3), rnorm(5)),
PM25 = c(runif(3), rnorm(5)),
Ozone = c(runif(3), rnorm(5)),
CO2 = c(runif(3), rnorm(5)))
airquality
Run Code Online (Sandbox Code Playgroud)
所以我得到一个带有数字的名称列表,所以我知道要选择哪些列:
nam<-names(airquality)
namelist <- data.frame(matrix(t(nam)));namelist
Run Code Online (Sandbox Code Playgroud)
我想按城市和年份计算 PM25、臭氧和二氧化碳的平均值。这意味着我需要第 1,2,4,6:7 列)
acast(datadf, year ~ city, mean, na.rm=TRUE)
Run Code Online (Sandbox Code Playgroud)
但这并不是我真正想要的,因为它包含了我不需要的东西的平均值,而且它不是数据帧格式。我可以转换它然后删除,但这似乎是一种非常低效的方法。
有没有更好的办法?
我正在使用包中的dcast
函数 library(reshape2)
来投射一个简单的三列表格
df = data.table(id = 1:1e6,
var = c('continent','subcontinent',...),
val = c('America','Caribbean',...)````
Run Code Online (Sandbox Code Playgroud)
bydcast(df, id ~ var, value.var ='val')
并自动将值转换为计数,即
id continent subcontinent
1 1 1
2 1 1
Run Code Online (Sandbox Code Playgroud)
但是,如果我将大小减少到 10000 行,它会正确输出
id continent subcontinent
1 America Caribbean
2 Europe West Europe
Run Code Online (Sandbox Code Playgroud)
这是一个错误还是我需要以某种方式更改代码?请帮忙。谢谢!
我正在使用 dcast 转置下表
date event user_id
25-07-2020 Create 3455
25-07-2020 Visit 3567
25-07-2020 Visit 3567
25-07-2020 Add 3567
25-07-2020 Add 3678
25-07-2020 Add 3678
25-07-2020 Create 3567
24-07-2020 Edit 3871
Run Code Online (Sandbox Code Playgroud)
我正在使用 dcast 转置以将我的事件作为列并计算 user_id
dae_summ <- dcast(ahoy_events, date ~ event, value.var="user_id")
但我没有得到唯一的用户 ID。它多次计算相同的 user_id。我该怎么做才能让一个 user_id 在同一日期和事件中只计算一次。
使用 dcast 时,如何根据列“Col”指定列顺序?
df <- dcast(x, ID ~ ColumnName, value.var = "Answer")
Run Code Online (Sandbox Code Playgroud)
我需要解决方案不特定于数据,因为 x 可以是任何问题的结果(因此 Col 可以是 1-3 或 1-2 等)。下面是 x 的两个虚拟示例。
ID Answer ColumnName Col
1 Anduin First Name 1
1 Wrynn Surname 2
1 Alliance Faction 3
2 Sylvanas First Name 1
2 Windrunner Surname 2
2 Horde Faction 3
ID Answer ColumnName Col
1 The Kirin Tor Quest 1
1 90 Level 2
2 Emissary Quest 1
2 38 Level 2
Run Code Online (Sandbox Code Playgroud)