小编Sha*_*ath的帖子

在R中提取部分字符串(直到第一个分号)

我有一个列包含由分号分隔的3个字符串的值.我需要提取字符串的第一部分.

Type <- c("SNSR_RMIN_PSX150Y_CSH;SP_12;I0.00V50HX0HY3000")
Run Code Online (Sandbox Code Playgroud)

我想要的是:获取字符串的第一部分(直到第一个分号).

输出: SNSR_RMIN_PSX150Y_CSH

我试过gsub但不能理解.请告诉我们如何在R中有效地做到这一点.

string r gsub

7
推荐指数
2
解决办法
5587
查看次数

R:如何获取时间序列数据中datetime列的最大值

我正在研究时间序列数据.我有2个日期时间列和1个会计周列.我给出了一个例子,我有一个像下面这样的情况,我需要得到EditDate的MAX.

EditDate <- c("2015-04-01 11:40:13", "2015-04-03 02:54:45","2015-04-07 11:40:13")
ID <- c("DL1X8", "DL1X8","DL1X8")
Avg <- c(38.1517, 38.1517, 38.1517)
Sig <- c(11.45880000, 11.45880000, 11.45880000)
InsertDate <- c("2015-04-03 9:40:00", "2015-04-03 9:40:00",2015-04-10 9:40:00)
FW <- c("39","39","40")

df1 <- data.frame(EditDate , ID, Avg, Sig, InsertDate, FW)
Run Code Online (Sandbox Code Playgroud)

这回来了

+---------------------+-------+---------+-------------+--------------------+----+
|   EditDate          | ID    | Avg     |   Sig       |    InsertDate      | FW |
+---------------------+-------+---------+-------------+--------------------+----+
| 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | 2015-04-03 9:40:00 | 39 |
| 2015-04-03 02:54:45 | DL1X8 | 38.1517 | …
Run Code Online (Sandbox Code Playgroud)

datetime r time-series max dataframe

6
推荐指数
1
解决办法
7523
查看次数

如何在R中对日期时间进行子集化并转动测量列

我有这样的数据帧

Datetime <- c("2015-12-31 08:30:13", "2015-12-31 12:45:00", "2016-01-01 02:53:20", "2016-01-01 03:22:18", 
              "2016-01-01 09:42:10", "2016-01-01 20:55:50", "2016-01-01 21:14:10", "2016-01-02 05:42:16",
              "2016-01-02 08:31:15", "2016-01-02 09:13:10", "2016-01-03 00:45:14", "2016-01-03 05:56:00", 
              "2016-01-03 13:44:00", "2016-01-03 14:41:20", "2016-01-03 15:33:10", "2016-01-04 04:24:00",
              "2016-01-04 17:24:12", "2016-01-04 17:28:16", "2016-01-04 18:22:34", "2016-01-05 02:34:31")

Measurement <- c("Length","Breadth","Height","Length",
                 "Breadth","Breadth","Breadth","Length",
                 "Length","Breadth","Height","Height",
                 "Height","Length","Height","Length",
                 "Length","Breadth","Breadth","Breadth")

df1 <- data.frame(Datetime,Measurement)
Run Code Online (Sandbox Code Playgroud)

我试图以这种格式对日期进行子集化

Day1 = December 31st,2015 at 6:30AM to January 1st 2016 6:30AM
Day2 = January 1st,2015 at 6:30AM to January 2nd 2016 6:30AM

etc..
Run Code Online (Sandbox Code Playgroud)

在执行此操作时,我还希望将"测量"列转换为各个列,并列出每个类别的计数

我想要的输出是

Days …
Run Code Online (Sandbox Code Playgroud)

datetime r subset reshape2

6
推荐指数
1
解决办法
91
查看次数

将列分隔为R中最后一个下划线的2列

我有这样的数据帧

id <-c("1","2","3")
col <- c("CHB_len_SCM_max","CHB_brf_SCM_min","CHB_PROC_S_SV_mean")

df <- data.frame(id,col)
Run Code Online (Sandbox Code Playgroud)

我想通过将"col"分成测量和stat来创建2列.stat基本上是最后一个下划线后的文本(最大值,最小值,平均值等)

想要的输出

  id   Measurement stat
   1   CHB_len_SCM  max  
   2   CHB_brf_SCM  min   
   3 CHB_PROC_S_SV mean    
Run Code Online (Sandbox Code Playgroud)

我试过这种方式但是空列中的stat列.我不确定我是否指向最后一个下划线.

library(tidyverse)
df1 <- df %>%
  # Separate the sensors and the summary statistic
  separate(col, into = c("Measurement", "stat"),sep = '\\_[^\\_]*$')
Run Code Online (Sandbox Code Playgroud)

我在这里错过了什么?有人能指出我正确的方向吗?

r tidyr

6
推荐指数
1
解决办法
923
查看次数

R:将行旋转到列中,并使用N/A表示缺失值

我有一个看起来像这样的数据框

NUM <- c("45", "45", "45", "45", "48", "50", "66", "66", "66", "68")
Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)

df1 <- data.frame(NUM,Type,Points)
Run Code Online (Sandbox Code Playgroud)

DF1:

+-----+------+--------+
| NUM | TYPE | Points |
+-----+------+--------+
|  45 | A    | 9.2    |
|  45 | F    | 60.8   |
|  45 | C    | 22.9   |
|  45 | B    | 1012.7 |
|  48 | D    | 18.7   |
|  50 | A …
Run Code Online (Sandbox Code Playgroud)

pivot r reshape dataframe melt

5
推荐指数
2
解决办法
5148
查看次数

R:在按ID分组时按总和汇总包含NA的列值

我有一个数据框

ID <- c("A","A","A","A","B","B","B","B") 
Type <- c(45,45,46,46,45,45,46,46)
Point_A <- c(10,NA,30,40,NA,80,NA,100) 
Point_B <- c(NA,32,43,NA,65,11,NA,53)
df <- data.frame(ID,Type,Point_A,Point_B)

    ID  Type    Point_A Point_B
1   A   45        10    NA
2   A   45        NA    32
3   A   46        30    43
4   A   46        40    NA
5   B   45        NA    65
6   B   45        80    11
7   B   46        NA    NA
8   B   46       100    53
Run Code Online (Sandbox Code Playgroud)

虽然我从这篇文章中了解到,但我可以用ID和一列来汇总数据.

我目前正在使用sqldf按ID和类型对行和组进行求和.虽然这对我来说很重要,但它在更大的数据集上却非常缓慢.

    df1 <- sqldf("SELECT ID, Type, Sum(Point_A) as Point_A, Sum(Point_A) as Point_A 
                  FROM df 
                  GROUP BY ID, …
Run Code Online (Sandbox Code Playgroud)

aggregate r plyr dplyr data.table

5
推荐指数
2
解决办法
998
查看次数

多个箱形图并排放置在ggplot中的不同列值

我已阅读不同的岗位像这样这个,但我的问题有一个小的变化.我有这样的df

ID <- c("DJ45","DJ46","DJ47","DJ48","DJ49","DJ53","DJ54","DJ55","DJ56","DJ57")
Tool <- c("Tool_A", "Tool_A", "Tool_A", "Tool_A", "Tool_A", "Tool_B", "Tool_B", "Tool_B", "Tool_B", "Tool_B")
Name <- c("CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP", "CMP")
MS1 <- c(51,55,50,59,50,47,48,42,43,46)
MS2 <- c(13,11,14,11,10,17,18,17,20,21)
MS3 <- c(2,3,2,5,6,4,9,6,4,4)
MS4 <- c(16,13,14,11,16,16,18,16,19,15)
MS5 <- c(3,6,3,6,3,4,4,8,5,4)
MS6 <- c(7,7,5,5,8,9,8,6,6,9)

df1 <- data.frame(ID,Tool,Name,MS1,MS2,MS3,MS4,MS5,MS6)
Run Code Online (Sandbox Code Playgroud)

我试图从统计上找到工具(Tool_A和Tool_B)在不同测量步骤中的不同之处,因此我进行了t检验.

t.test(MS1 ~ Tool, df1)
Run Code Online (Sandbox Code Playgroud)

我使用ggplot进行可视化的盒子图,但是我在这里执行了其中一个步骤.

p <- ggplot(df1, aes(factor(Tool), MS6))
p + geom_boxplot(aes(fill = Tool)) + labs(title = "CMP")
Run Code Online (Sandbox Code Playgroud)

我想通过将所有6个测量步骤并排放置箱形图来将所有内容包装在共同标题(CMP)下.facet_wrap可以这样做吗?我只是无法做到对.请提供建议.

r facet ggplot2 boxplot

5
推荐指数
2
解决办法
2万
查看次数

使用NA计算列中值

我试图计算R中各列的中位数,然后用列中的每个值减去中值.我在这里遇到的问题是我在我的专栏中有N/A,我不想删除但只返回它们而不减去中位数.例如

ID <- c("A","B","C","D","E") 
Point_A <- c(1, NA, 3, NA, 5) 
Point_B <- c(NA, NA, 1, 3, 2)

df <- data.frame(ID,Point_A ,Point_B)
Run Code Online (Sandbox Code Playgroud)

是否可以计算具有N/A的柱的中值?我的结果是

+----+---------+---------+
| ID | Point_A | Point_B |
+----+---------+---------+
| A  | -2      | NA      |
| B  | NA      | NA      |
| C  | 0       | -1      |
| D  | NA      | 1       |
| E  | 2       | 0       |
+----+---------+---------+
Run Code Online (Sandbox Code Playgroud)

r median na

4
推荐指数
1
解决办法
218
查看次数

将行旋转到具有每个测量R的计数值的列中

我有一个我正在使用的示例数据框

ID <- c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B")
TARG_AVG <- c(2.1,2.1,2.1,2.1,2.1,2.1,2.3,2.3,2.5,2.5,2.5,2.5,3.1,3.1,3.1,3.1,3.3,3.3,3.3,3.3,3.5,3.5)
Measurement <- c("Len","Len","Len","Wid","Ht","Ht","Dep","Brt","Ht","Ht","Dep","Dep"
                 ,"Dep","Dep","Len","Len","Ht","Ht","Brt","Brt","Wid","Wid")
df1 <- data.frame(ID,TARG_AVG,Measurement)
Run Code Online (Sandbox Code Playgroud)

我想在这里解决3个不同的问题

1)我想得到(ID&TARG_AVG)分组有多少独特测量的摘要.我现在这样做

unique <- summaryBy(Measurement~ID+TARG_AVG, data=df1, FUN=function(x) { c(Count=length(x)) } ) 
Run Code Online (Sandbox Code Playgroud)

这给了我总计(measurement.count),但我也想要每个测量的计数.我想要的输出

  ID TARG_AVG Len Wid Ht Dep Brt Measurement.Count
1  A      2.1   3   1  2   0   0                 6
2  A      2.3   0   0  0   1   1                 2
3  A      2.5   0   0  2   2   0                 4
4  B      3.1   2   0  0   2   0                 4
5  B      3.3   0   0  2   0   2 …
Run Code Online (Sandbox Code Playgroud)

r plyr reshape2 dplyr data.table

4
推荐指数
1
解决办法
584
查看次数

计算R中的故障率和日期时间操作

我有一个我正在使用的示例数据框

Datetime <- c("2015-09-29 08:22:00", "2015-09-29 09:45:00", "2015-09-29 09:53:00", "2015-09-29 10:22:00", "2015-09-29 10:42:00",
                  "2015-09-29 11:31:00", "2015-09-29 11:47:00", "2015-09-29 12:45:00", "2015-09-29 13:11:00", "2015-09-29 13:44:00",
                  "2015-09-29 15:24:00", "2015-09-29 16:28:00", "2015-09-29 20:22:00", "2015-09-29 21:38:00", "2015-09-29 23:34:00")
Measurement <- c("Length","Length","Width","Height","Width","Height","Length","Width","Width","Height","Width","Length",
                     "Length","Height","Height")
PASSFAIL <- c("PASS","PASS","FAIL","PASS","PASS","FAIL_AVG_HIGH","FAIL#Pts","FAIL","FAIL_AVG_LOW","FAIL","PASS","PASS","FAIL#RNG#HIGH","PASS","FAIL")

df1 <- data.frame(Datetime,Measurement,PASSFAIL)
Run Code Online (Sandbox Code Playgroud)

DF1

              Datetime Measurement      PASSFAIL
1  2015-09-29 08:22:00      Length          PASS
2  2015-09-29 09:45:00      Length          PASS
3  2015-09-29 09:53:00       Width          FAIL
4  2015-09-29 10:22:00      Height          PASS
5  2015-09-29 10:42:00       Width          PASS
6  2015-09-29 11:31:00      Height FAIL_AVG_HIGH
7  2015-09-29 11:47:00 …
Run Code Online (Sandbox Code Playgroud)

r reshape2 dplyr data.table

3
推荐指数
1
解决办法
154
查看次数

从 R 中的给定字符串中提取日期

这是我有的一个字符串

"7MA_S_VE_MS_FB_MEASURE_P1_2013-08-21_17-42-19.BMP"
Run Code Online (Sandbox Code Playgroud)

我正在尝试以这种方式提取日期:

library(stringr)
as.Date(str_extract(test,"[0-9]{4}/[0-9]{2}/[0-9]{2}"),"%Y-%m-%d")
Run Code Online (Sandbox Code Playgroud)

我为此得到了NA 。

期望的输出

2013-08-21
Run Code Online (Sandbox Code Playgroud)

有人能指出我正确的方向吗?

datetime r gsub

3
推荐指数
1
解决办法
4459
查看次数

R:使用“传播”功能进行旋转

继续我的上一篇文章,我现在还有 1 列 ID 值,我需要用它来将行转换为列。

    NUM <- c(1,2,3,1,2,3,1,2,3,1)
    ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48")
    Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
    Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)

    df1 <- data.frame(ID,NUM,Type,Points)

df1:
    +------+-----+------+--------+
    | ID   | Num | Type | Points |
    +------+-----+------+--------+
    | DJ45 |   1 | A    | 9.2    |
    | DJ45 |   2 | F    | 60.8   |
    | DJ45 |   3 | C    | 22.9   |
    | DJ46 |   1 | B    | 1012.7 |
    | …
Run Code Online (Sandbox Code Playgroud)

pivot r dataframe melt tidyr

1
推荐指数
1
解决办法
2119
查看次数

检查所有值是否在多列上都是数字并将它们转换为数字

我有一个数据框,所有列都是这样的字符.

ID <- c("A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B")
ToolID <- c("CCP_A","CCP_A","CCQ_A","CCQ_A","IOT_B","CCP_B","CCQ_B","IOT_B",
            "CCP_A","CCP_A","CCQ_A","CCQ_A","IOT_B","CCP_B","CCQ_B","IOT_B")
Step <- c("Step_A","Step_A","Step_B","Step_C","Step_D","Step_D","Step_E","Step_F",
          "Step_A","Step_A","Step_B","Step_C","Step_D","Step_D","Step_E","Step_F")
Measurement <- c("Length","Breadth","Width","Height",NA,NA,NA,NA,
                 "Length","Breadth","Width","Height",NA,NA,NA,NA)
Passfail <- c("Pass","Pass","Fail","Fail","Pass","Pass","Pass","Pass",
              "Pass","Pass","Fail","Fail","Pass","Pass","Pass","Pass")
Points <- as.character(c(7,5,3,4,0,0,0,0,17,15,13,14,0,0,0,0))
Average <- as.character(c(7.5,6.5,7.1,6.6,NA,NA,NA,NA,17.5,16.5,17.1,16.6,NA,NA,NA,NA))
Sigma <- as.character(c(2.5,2.5,2.1,2.6,NA,NA,NA,NA,12.5,12.5,12.1,12.6,NA,NA,NA,NA))
Tool <- c("ABC_1","ABC_2","ABD_1","ABD_2","COB_1","COB_2","COB_1","COB_2",
          "ABC_1","ABC_2","ABD_1","ABD_2","COB_1","COB_2","COB_1","COB_2")
Dose <- as.character(c(NA,NA,NA,NA,17.1,NA,NA,17.3,NA,NA,NA,NA,117.1,NA,NA,117.3))
Machine <- c("CO2","CO6","CO3","CO6","CO2,CO6","CO2,CO3,CO4","CO2,CO3","CO2",
             "CO2","CO6","CO3","CO6","CO2,CO6","CO2,CO3,CO4","CO2,CO3","CO2")

df <- data.frame(ID,ToolID,Step,Measurement,Passfail,Points,Average,Sigma,Tool,Dose,Machine)
Run Code Online (Sandbox Code Playgroud)

我试图检查这些字符向量的数值,然后将数字值转换为数字.我在R中使用"varhandle"包来做到这一点

library(varhandle)

if(all(check.numeric(df$Machine, na.rm=TRUE))){
  # convert the vector to numeric
  df$Machine <- as.numeric(df$Machine)
}
Run Code Online (Sandbox Code Playgroud)

这有效但效率低,因为我必须手动输入上面的列名.如何在循环中更有效地执行此操作或在多列上使用矢量化?我的实际数据集有大约350列.有人能指出我正确的方向吗?

r dataframe dplyr data.table tidyr

0
推荐指数
1
解决办法
319
查看次数

标签 统计

r ×13

data.table ×4

dataframe ×4

dplyr ×4

datetime ×3

reshape2 ×3

tidyr ×3

gsub ×2

melt ×2

pivot ×2

plyr ×2

aggregate ×1

boxplot ×1

facet ×1

ggplot2 ×1

max ×1

median ×1

na ×1

reshape ×1

string ×1

subset ×1

time-series ×1