我有一个data.table看起来像这样的格式的数据集:
ID time.s time.e
1 1 2
2 1 4
3 2 3
4 2 4
Run Code Online (Sandbox Code Playgroud)
我想检查值1是否在内time.s,time.e以便最终结果如下
[1] TRUE TRUE FALSE FALSE
Run Code Online (Sandbox Code Playgroud)
我该怎么做?我试过用
a[1 %in% seq(time.s, time.e)]
Run Code Online (Sandbox Code Playgroud)
但我得到的只是所有真值.有什么建议?
我有一个数据集,如下所示:
ID FromDate ToDate SiteID Cost
1 8/12/2014 8/31/2014 12 245.98
1 9/1/2014 9/7/2014 12 269.35
1 10/10/2014 10/17/2014 12 209.98
1 11/22/2014 11/30/2014 12 309.12
1 12/1/2014 12/11/2014 12 202.14
2 8/16/2014 8/21/2014 12 109.35
2 8/22/2014 8/24/2014 14 44.12
2 9/25/2014 9/29/2014 12 98.75
3 9/15/2014 9/30/2014 23 536.27
3 10/1/2014 10/31/2014 12 529.87
3 11/1/2014 11/30/2014 12 969.55
3 12/1/2014 12/12/2014 12 607.35
Run Code Online (Sandbox Code Playgroud)
我希望这看起来像是:
ID FromDate ToDate SiteID Cost
1 8/12/2014 9/7/2014 12 515.33
1 10/10/2014 10/17/2014 …Run Code Online (Sandbox Code Playgroud) 我有一个字符串向量,范围从3个字符到59个字符.我试图在10个字符后用"..."截断任何大于13个字符的字符串.例如,如果
a <- c("AMS", "CCD", "TCGGCKGTPGPHOLKP", "NOK", "THIS IS A LONG STRING", "JSQU909LPPLU")
Run Code Online (Sandbox Code Playgroud)
然后我想得到
"AMS" "CCD" "TCGGCKGTPG..." "NOK" "THIS IS A ..." "JSQU909LPPLU"
Run Code Online (Sandbox Code Playgroud)
我相信它需要一份if声明gsub,我的问题就是gsub.有什么想法吗?
我有一个看起来像的数据集
City Score Count Returns
Dallas 2.9 61 21
Phoenix 2.6 52 14
Milwaukee 1.7 38 7
Chicago 1.2 95 16
Phoenix 5.9 96 16
Dallas 1.9 45 12
Dallas 2.7 75 45
Chicago 2.2 75 10
Milwaukee 2.6 12 2
Milwaukee 4.5 32 0
Dallas 1.9 65 12
Chicago 4.9 95 13
Chicago 5 45 5
Phoenix 5.2 43 5
Run Code Online (Sandbox Code Playgroud)
我想用R markdown建立一份报告; 但是,对于每个城市,我需要建立一份报告.原因是一个城市无法看到另一个城市的报告.如何为每个城市构建报告并保存PDF?
每份报告都需要中位数Score,平均值Count和平均值Returns.我知道使用dplyr我可以简单地使用
finaldat <- dat %>%
group_by(City) %>%
summarise(Score …Run Code Online (Sandbox Code Playgroud) 我有一个超过100列的数据集,但是例如,假设我有一个看起来像
dput(tib)
structure(list(f_1 = c("A", "O", "AC", "AC", "AC", "O", "A", "AC", "O", "O"), f_2 = c("New", "New",
"New", "New", "Renewal", "Renewal", "New", "Renewal", "New",
"New"), first_dt = c("07-MAY-18", "25-JUL-16", "09-JUN-18", "22-APR-19",
"03-MAR-19", "10-OCT-16", "08-APR-19", "27-FEB-17", "02-MAY-16",
"26-MAY-15"), second_dt = c(NA, "27-JUN-16", NA, "18-APR-19",
"27-FEB-19", "06-OCT-16", "04-APR-19", "27-FEB-17", "25-APR-16",
NA), third_dt = c("04-APR-16", "21-JUL-16", "05-JUN-18", "18-APR-19",
"27-FEB-19", "06-OCT-16", "04-APR-19", "27-FEB-17", "25-APR-16",
"19-MAY-15"), fourth_dt = c("05-FEB-15", "25-JAN-16", "05-JUN-18",
"10-OCT-18", "08-JAN-19", "02-SEP-16", "24-OCT-18", "29-SEP-16",
"27-JAN-15", "14-MAY-15"), fifth_dt = structure(c(1459728000,
1469059200, 1528156800, 1555545600, …Run Code Online (Sandbox Code Playgroud) 我有一个data.table名字dt,如下所示:
ID V1 V2 V3 V4 V5 time color
1 F T F F T 1 red
1 F T T F T 2 red
2 T F T F F 1 blue
3 F F F F F 2 green
3 T T T F T 3 purple
Run Code Online (Sandbox Code Playgroud)
实际上,dim(dt) = [1321221 123].现在我知道一般来说,真和假分别在R中存储为1和0.我也有一个数组l,虽然看起来像
V1 V2 V3 V4 V5
1 2 1 3 4
Run Code Online (Sandbox Code Playgroud)
这些是分配给的权重V1,V2,V3,V4,V5.我想将这些权重乘以真值(因为它们的数值为1)并在每行中添加,就像我们可以用矩阵一样.输出应该是这样的
ID Total time color
1 6 …Run Code Online (Sandbox Code Playgroud) 我有两个表,R如下所示:
DT.Purchase <- data.frame( ID = c(1,1,1,2,2,3,3,3,3,3,4,4,4,4),
CDS = c("0389","0389", "3298", "4545", "1282", "4545",
"0389","0389", "5685", "4545", "1282", "0389",
"1282", "1282")
Date = c("5/28/2016","5/26/2016","8/9/2016","2/2/2015",
"2/24/2015", "9/27/2015", "9/27/2015", "9/5/2015",
"3/3/2016", "4/9/2014", "5/1/2014", "5/4/2014",
"6/9/2014", "7/7/2014"),
JFK = c(T,F,F,F,T,T,F,F,T,F,T,T,T,F),
RFK = c(F,T,T,F,T,F,F,F,F,T,T,T,T,T),
RUG = c(T,F,T,F,T,F,F,F,F,T,F,F,T,T),
LPG = c(T,T,T,F,F,T,T,F,F,F,F,F,T,F))
DT.Purchase$Date <- as.Date(DT.Purchase$Date, format = "%m/%d/%Y")
DT.Purchase <- data.table(DT.Purchase)
ID CDS Date JFK RFK RUG LPG
1 0389 5/28/2016 T F T T
1 0389 5/26/2016 F T F T
1 3298 …Run Code Online (Sandbox Code Playgroud) 我正在使用 REGEXP 过滤具有 10 行的数据集,如下所示:
ID Product
1 "VENLAFAXINE HCL CAP ER 24HR 37.5 MG (BASE EQUIVALENT)"
2 "MINOXIDIL POWDER"
3 "MENTHOL LOZENGE 10 MG"
4 "ZINC CHLORIDE GRANULES"
5 "CLOPIDOGREL BISULFATE TAB 75 MG (BASE EQUIV)"
6 "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)"
7 "DEXAMETHASONE TAB THERAPY PACK 1.5 MG (7)"
8 "METHYLPREDNISOLONE DOSE P (16)"
9 "MILLIPRED DP (13)"
10 "ZONACORT 7 DAY"
Run Code Online (Sandbox Code Playgroud)
并且会让它看起来像
ID Product
6 "METHYLPREDNISOLONE TAB THERAPY PACK 4 MG (21)"
7 "DEXAMETHASONE TAB …Run Code Online (Sandbox Code Playgroud) 我是 R 的初学者。我有一个数据框,其中有两个因子列。一栏是公司栏,第二栏是产品栏。产品列中有几个缺失值,因此我想计算每个公司(或公司变量的每个级别)的产品列中的值数。我在 plyr 包中尝试了 table 和 count 函数,但它们似乎只适用于数字变量。请帮忙!假设数据框如下所示:
df <- data.frame(company= c("A", "B", "C", "D", "A", "B", "C", "C", "D", "D"), product = c(1, 1, 2, 3, 4, 3, 3, NA, NA, NA))
Run Code Online (Sandbox Code Playgroud)
所以我正在寻找的输出是 -
A 2 B 2 C 3 D 2
提前致谢!!
我有一个看起来像的数据集
structure(list(ID = 1:100, A = c(1, 1, 1, 0, 0, 0, 1, 0, 1, 1,
0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0,
0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0,
1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0,
1, …Run Code Online (Sandbox Code Playgroud) r ×9
data.table ×2
dplyr ×2
apache-spark ×1
date ×1
lubridate ×1
performance ×1
r-markdown ×1
regex ×1
sql ×1