我有一个列包含由分号分隔的3个字符串的值.我需要提取字符串的第一部分.
Type <- c("SNSR_RMIN_PSX150Y_CSH;SP_12;I0.00V50HX0HY3000")
Run Code Online (Sandbox Code Playgroud)
我想要的是:获取字符串的第一部分(直到第一个分号).
输出: SNSR_RMIN_PSX150Y_CSH
我试过gsub但不能理解.请告诉我们如何在R中有效地做到这一点.
我正在研究时间序列数据.我有2个日期时间列和1个会计周列.我给出了一个例子,我有一个像下面这样的情况,我需要得到EditDate的MAX.
EditDate <- c("2015-04-01 11:40:13", "2015-04-03 02:54:45","2015-04-07 11:40:13")
ID <- c("DL1X8", "DL1X8","DL1X8")
Avg <- c(38.1517, 38.1517, 38.1517)
Sig <- c(11.45880000, 11.45880000, 11.45880000)
InsertDate <- c("2015-04-03 9:40:00", "2015-04-03 9:40:00",2015-04-10 9:40:00)
FW <- c("39","39","40")
df1 <- data.frame(EditDate , ID, Avg, Sig, InsertDate, FW)
Run Code Online (Sandbox Code Playgroud)
这回来了
+---------------------+-------+---------+-------------+--------------------+----+
| EditDate | ID | Avg | Sig | InsertDate | FW |
+---------------------+-------+---------+-------------+--------------------+----+
| 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | 2015-04-03 9:40:00 | 39 |
| 2015-04-03 02:54:45 | DL1X8 | 38.1517 | …Run Code Online (Sandbox Code Playgroud) 我有这样的数据帧
Datetime <- c("2015-12-31 08:30:13", "2015-12-31 12:45:00", "2016-01-01 02:53:20", "2016-01-01 03:22:18",
"2016-01-01 09:42:10", "2016-01-01 20:55:50", "2016-01-01 21:14:10", "2016-01-02 05:42:16",
"2016-01-02 08:31:15", "2016-01-02 09:13:10", "2016-01-03 00:45:14", "2016-01-03 05:56:00",
"2016-01-03 13:44:00", "2016-01-03 14:41:20", "2016-01-03 15:33:10", "2016-01-04 04:24:00",
"2016-01-04 17:24:12", "2016-01-04 17:28:16", "2016-01-04 18:22:34", "2016-01-05 02:34:31")
Measurement <- c("Length","Breadth","Height","Length",
"Breadth","Breadth","Breadth","Length",
"Length","Breadth","Height","Height",
"Height","Length","Height","Length",
"Length","Breadth","Breadth","Breadth")
df1 <- data.frame(Datetime,Measurement)
Run Code Online (Sandbox Code Playgroud)
我试图以这种格式对日期进行子集化
Day1 = December 31st,2015 at 6:30AM to January 1st 2016 6:30AM
Day2 = January 1st,2015 at 6:30AM to January 2nd 2016 6:30AM
etc..
Run Code Online (Sandbox Code Playgroud)
在执行此操作时,我还希望将"测量"列转换为各个列,并列出每个类别的计数
我想要的输出是
Days …Run Code Online (Sandbox Code Playgroud) 我有这样的数据帧
id <-c("1","2","3")
col <- c("CHB_len_SCM_max","CHB_brf_SCM_min","CHB_PROC_S_SV_mean")
df <- data.frame(id,col)
Run Code Online (Sandbox Code Playgroud)
我想通过将"col"分成测量和stat来创建2列.stat基本上是最后一个下划线后的文本(最大值,最小值,平均值等)
我想要的输出是
id Measurement stat
1 CHB_len_SCM max
2 CHB_brf_SCM min
3 CHB_PROC_S_SV mean
Run Code Online (Sandbox Code Playgroud)
我试过这种方式但是空列中的stat列.我不确定我是否指向最后一个下划线.
library(tidyverse)
df1 <- df %>%
# Separate the sensors and the summary statistic
separate(col, into = c("Measurement", "stat"),sep = '\\_[^\\_]*$')
Run Code Online (Sandbox Code Playgroud)
我在这里错过了什么?有人能指出我正确的方向吗?
我有一个看起来像这样的数据框
NUM <- c("45", "45", "45", "45", "48", "50", "66", "66", "66", "68")
Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)
df1 <- data.frame(NUM,Type,Points)
Run Code Online (Sandbox Code Playgroud)
DF1:
+-----+------+--------+
| NUM | TYPE | Points |
+-----+------+--------+
| 45 | A | 9.2 |
| 45 | F | 60.8 |
| 45 | C | 22.9 |
| 45 | B | 1012.7 |
| 48 | D | 18.7 |
| 50 | A …Run Code Online (Sandbox Code Playgroud) 我有一个数据框
ID <- c("A","A","A","A","B","B","B","B")
Type <- c(45,45,46,46,45,45,46,46)
Point_A <- c(10,NA,30,40,NA,80,NA,100)
Point_B <- c(NA,32,43,NA,65,11,NA,53)
df <- data.frame(ID,Type,Point_A,Point_B)
ID Type Point_A Point_B
1 A 45 10 NA
2 A 45 NA 32
3 A 46 30 43
4 A 46 40 NA
5 B 45 NA 65
6 B 45 80 11
7 B 46 NA NA
8 B 46 100 53
Run Code Online (Sandbox Code Playgroud)
虽然我从这篇文章中了解到,但我可以用ID和一列来汇总数据.
我目前正在使用sqldf按ID和类型对行和组进行求和.虽然这对我来说很重要,但它在更大的数据集上却非常缓慢.
df1 <- sqldf("SELECT ID, Type, Sum(Point_A) as Point_A, Sum(Point_A) as Point_A
FROM df
GROUP BY ID, …Run Code Online (Sandbox Code Playgroud) 我已阅读不同的岗位像这样和这个,但我的问题有一个小的变化.我有这样的df
ID <- c("DJ45","DJ46","DJ47","DJ48","DJ49","DJ53","DJ54","DJ55","DJ56","DJ57")
Tool <- c("Tool_A", "Tool_A", "Tool_A", "Tool_A", "Tool_A", "Tool_B", "Tool_B", "Tool_B", "Tool_B", "Tool_B")
Name <- c("CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP")
MS1 <- c(51,55,50,59,50,47,48,42,43,46)
MS2 <- c(13,11,14,11,10,17,18,17,20,21)
MS3 <- c(2,3,2,5,6,4,9,6,4,4)
MS4 <- c(16,13,14,11,16,16,18,16,19,15)
MS5 <- c(3,6,3,6,3,4,4,8,5,4)
MS6 <- c(7,7,5,5,8,9,8,6,6,9)
df1 <- data.frame(ID,Tool,Name,MS1,MS2,MS3,MS4,MS5,MS6)
Run Code Online (Sandbox Code Playgroud)
我试图从统计上找到工具(Tool_A和Tool_B)在不同测量步骤中的不同之处,因此我进行了t检验.
t.test(MS1 ~ Tool, df1)
Run Code Online (Sandbox Code Playgroud)
我使用ggplot进行可视化的盒子图,但是我在这里执行了其中一个步骤.
p <- ggplot(df1, aes(factor(Tool), MS6))
p + geom_boxplot(aes(fill = Tool)) + labs(title = "CMP")
Run Code Online (Sandbox Code Playgroud)
我想通过将所有6个测量步骤并排放置箱形图来将所有内容包装在共同标题(CMP)下.facet_wrap可以这样做吗?我只是无法做到对.请提供建议.
我试图计算R中各列的中位数,然后用列中的每个值减去中值.我在这里遇到的问题是我在我的专栏中有N/A,我不想删除但只返回它们而不减去中位数.例如
ID <- c("A","B","C","D","E")
Point_A <- c(1, NA, 3, NA, 5)
Point_B <- c(NA, NA, 1, 3, 2)
df <- data.frame(ID,Point_A ,Point_B)
Run Code Online (Sandbox Code Playgroud)
是否可以计算具有N/A的柱的中值?我的结果是
+----+---------+---------+
| ID | Point_A | Point_B |
+----+---------+---------+
| A | -2 | NA |
| B | NA | NA |
| C | 0 | -1 |
| D | NA | 1 |
| E | 2 | 0 |
+----+---------+---------+
Run Code Online (Sandbox Code Playgroud) 我有一个我正在使用的示例数据框
ID <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
TARG_AVG <- c(2.1,2.1,2.1,2.1,2.1,2.1,2.3,2.3,2.5,2.5,2.5,2.5,3.1,3.1,3.1,3.1,3.3,3.3,3.3,3.3,3.5,3.5)
Measurement <- c("Len","Len","Len","Wid","Ht","Ht","Dep","Brt","Ht","Ht","Dep","Dep"
,"Dep","Dep","Len","Len","Ht","Ht","Brt","Brt","Wid","Wid")
df1 <- data.frame(ID,TARG_AVG,Measurement)
Run Code Online (Sandbox Code Playgroud)
我想在这里解决3个不同的问题
1)我想得到(ID&TARG_AVG)分组有多少独特测量的摘要.我现在这样做
unique <- summaryBy(Measurement~ID+TARG_AVG, data=df1, FUN=function(x) { c(Count=length(x)) } )
Run Code Online (Sandbox Code Playgroud)
这给了我总计(measurement.count),但我也想要每个测量的计数.我想要的输出是
ID TARG_AVG Len Wid Ht Dep Brt Measurement.Count
1 A 2.1 3 1 2 0 0 6
2 A 2.3 0 0 0 1 1 2
3 A 2.5 0 0 2 2 0 4
4 B 3.1 2 0 0 2 0 4
5 B 3.3 0 0 2 0 2 …Run Code Online (Sandbox Code Playgroud) 我有一个我正在使用的示例数据框
Datetime <- c("2015-09-29 08:22:00", "2015-09-29 09:45:00", "2015-09-29 09:53:00", "2015-09-29 10:22:00", "2015-09-29 10:42:00",
"2015-09-29 11:31:00", "2015-09-29 11:47:00", "2015-09-29 12:45:00", "2015-09-29 13:11:00", "2015-09-29 13:44:00",
"2015-09-29 15:24:00", "2015-09-29 16:28:00", "2015-09-29 20:22:00", "2015-09-29 21:38:00", "2015-09-29 23:34:00")
Measurement <- c("Length","Length","Width","Height","Width","Height","Length","Width","Width","Height","Width","Length",
"Length","Height","Height")
PASSFAIL <- c("PASS","PASS","FAIL","PASS","PASS","FAIL_AVG_HIGH","FAIL#Pts","FAIL","FAIL_AVG_LOW","FAIL","PASS","PASS","FAIL#RNG#HIGH","PASS","FAIL")
df1 <- data.frame(Datetime,Measurement,PASSFAIL)
Run Code Online (Sandbox Code Playgroud)
DF1
Datetime Measurement PASSFAIL
1 2015-09-29 08:22:00 Length PASS
2 2015-09-29 09:45:00 Length PASS
3 2015-09-29 09:53:00 Width FAIL
4 2015-09-29 10:22:00 Height PASS
5 2015-09-29 10:42:00 Width PASS
6 2015-09-29 11:31:00 Height FAIL_AVG_HIGH
7 2015-09-29 11:47:00 …Run Code Online (Sandbox Code Playgroud) 这是我有的一个字符串
"7MA_S_VE_MS_FB_MEASURE_P1_2013-08-21_17-42-19.BMP"
Run Code Online (Sandbox Code Playgroud)
我正在尝试以这种方式提取日期:
library(stringr)
as.Date(str_extract(test,"[0-9]{4}/[0-9]{2}/[0-9]{2}"),"%Y-%m-%d")
Run Code Online (Sandbox Code Playgroud)
我为此得到了NA 。
期望的输出是
2013-08-21
Run Code Online (Sandbox Code Playgroud)
有人能指出我正确的方向吗?
继续我的上一篇文章,我现在还有 1 列 ID 值,我需要用它来将行转换为列。
NUM <- c(1,2,3,1,2,3,1,2,3,1)
ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48")
Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)
df1 <- data.frame(ID,NUM,Type,Points)
df1:
+------+-----+------+--------+
| ID | Num | Type | Points |
+------+-----+------+--------+
| DJ45 | 1 | A | 9.2 |
| DJ45 | 2 | F | 60.8 |
| DJ45 | 3 | C | 22.9 |
| DJ46 | 1 | B | 1012.7 |
| …Run Code Online (Sandbox Code Playgroud) 我有一个数据框,所有列都是这样的字符.
ID <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B")
ToolID <- c("CCP_A","CCP_A","CCQ_A","CCQ_A","IOT_B","CCP_B","CCQ_B","IOT_B",
"CCP_A","CCP_A","CCQ_A","CCQ_A","IOT_B","CCP_B","CCQ_B","IOT_B")
Step <- c("Step_A","Step_A","Step_B","Step_C","Step_D","Step_D","Step_E","Step_F",
"Step_A","Step_A","Step_B","Step_C","Step_D","Step_D","Step_E","Step_F")
Measurement <- c("Length","Breadth","Width","Height",NA,NA,NA,NA,
"Length","Breadth","Width","Height",NA,NA,NA,NA)
Passfail <- c("Pass","Pass","Fail","Fail","Pass","Pass","Pass","Pass",
"Pass","Pass","Fail","Fail","Pass","Pass","Pass","Pass")
Points <- as.character(c(7,5,3,4,0,0,0,0,17,15,13,14,0,0,0,0))
Average <- as.character(c(7.5,6.5,7.1,6.6,NA,NA,NA,NA,17.5,16.5,17.1,16.6,NA,NA,NA,NA))
Sigma <- as.character(c(2.5,2.5,2.1,2.6,NA,NA,NA,NA,12.5,12.5,12.1,12.6,NA,NA,NA,NA))
Tool <- c("ABC_1","ABC_2","ABD_1","ABD_2","COB_1","COB_2","COB_1","COB_2",
"ABC_1","ABC_2","ABD_1","ABD_2","COB_1","COB_2","COB_1","COB_2")
Dose <- as.character(c(NA,NA,NA,NA,17.1,NA,NA,17.3,NA,NA,NA,NA,117.1,NA,NA,117.3))
Machine <- c("CO2","CO6","CO3","CO6","CO2,CO6","CO2,CO3,CO4","CO2,CO3","CO2",
"CO2","CO6","CO3","CO6","CO2,CO6","CO2,CO3,CO4","CO2,CO3","CO2")
df <- data.frame(ID,ToolID,Step,Measurement,Passfail,Points,Average,Sigma,Tool,Dose,Machine)
Run Code Online (Sandbox Code Playgroud)
我试图检查这些字符向量的数值,然后将数字值转换为数字.我在R中使用"varhandle"包来做到这一点
library(varhandle)
if(all(check.numeric(df$Machine, na.rm=TRUE))){
# convert the vector to numeric
df$Machine <- as.numeric(df$Machine)
}
Run Code Online (Sandbox Code Playgroud)
这有效但效率低,因为我必须手动输入上面的列名.如何在循环中更有效地执行此操作或在多列上使用矢量化?我的实际数据集有大约350列.有人能指出我正确的方向吗?