我有时间戳指示事件开始的时间和结束的时间:
x <- "00:01:00.000 - 00:01:10.500"
Run Code Online (Sandbox Code Playgroud)
我需要计算事件的持续时间。利用hms从包lubridate以及lapply和strsplit它给我预期的输出:
library(lubridate)
unlist(lapply(strsplit(x, split=" - "), function(x) as.numeric(hms(x))))[2] - unlist(lapply(strsplit(x, split=" - "), function(x) as.numeric(hms(x))))[1]
[1] 10.5
Run Code Online (Sandbox Code Playgroud)
但我觉得代码完全不优雅,一点也不简洁。有没有更好的方法来获得持续时间?
编辑:
如果,因为是事实确实如此,有很多超过在短短的一个值x,如:
x <- c("00:01:00.000 - 00:01:10.500", "00:12:12.000 - 00:13:10.500")
Run Code Online (Sandbox Code Playgroud)
我想出了这个解决方案:
timepoints <- lapply(strsplit(x, split=" - "), function(x) as.numeric(hms(x)))
duration <- lapply(timepoints, function(x) x[2]-x[1])
duration
[[1]]
[1] 10.5
[[2]]
[1] 58.5
Run Code Online (Sandbox Code Playgroud)
但是,再一次,肯定有一个更好、更短的。
我正在努力以干净的方式将逗号分隔的字符串剥离为唯一的子字符串:
x <- c("Anna & x, Anna & x", #
"Alb, Berta 222, Alb",
"Al Pacino",
"Abb cd xy, Abb cd xy, C123, C123, B")
Run Code Online (Sandbox Code Playgroud)
我似乎对负字符类、负前瞻和反向引用的组合表现得很好;然而令我困扰的是,在许多子字符串中都有不需要的空格:
library(stringr)
str_extract_all(x, "([^,]+)(?!.*\\1)")
[[1]]
[1] " Anna & x"
[[2]]
[1] " Berta 222" " Alb"
[[3]]
[1] "Al Pacino"
[[4]]
[1] " Abb cd xy" " C123" " B"
Run Code Online (Sandbox Code Playgroud)
如何改进模式以便不提取不需要的空白?
Desired result:
#> [[1]]
#> [1] "Anna & x"
#> [[2]]
#> [1] "Alb" "Berta 222"
#> [[3]]
#> [1] "Al Pacino"
#> …Run Code Online (Sandbox Code Playgroud) 我有带有注释符号的话语:
\nutt <- c("\xe2\x86\x91hey girls\xe2\x86\x91 can I <join yo:u>", "((v: grunts))", "!damn shit! got it", \n"I mean /yeah we saw each other at a party:/\xe2\x86\x93 the other day"\n)\nRun Code Online (Sandbox Code Playgroud)\n我需要拆分utt成单独的单词,除非这些单词被某些分隔符括起来,包括此类[(/\xe2\x89\x88\xe2\x86\x91\xc2\xa3<>\xc2\xb0!]。我对s 使用双负前瞻做得相当好utt,其中分隔符之间只出现一个这样的字符串;但当分隔符之间有多个此类字符串时,我无法正确分割:
library(tidyr)\nlibrary(dplyr)\ndata.frame(utt2) %>%\n separate_rows(utt, sep = "(?!.*[(/\xe2\x89\x88\xe2\x86\x91\xc2\xa3<>\xc2\xb0!].*)\\\\s(?!.*[)/\xe2\x89\x88\xe2\x86\x91\xc2\xa3<>\xc2\xb0!])")\n# A tibble: 9 \xc3\x97 1\n utt2 \n <chr> \n1 \xe2\x86\x91hey girls\xe2\x86\x91 can I <join yo:u> \n2 ((v: grunts)) \n3 !damn shit! \n4 got \n5 it \n6 I …Run Code Online (Sandbox Code Playgroud) 我有一个 R 数据框如下-
df <- data.frame(
FDR = c (0.009, 0.007, 0.007),
Probe_ID = c("1555272_at", "1557203_at", "1557384_at"),
Gene.Symbol = c("RSPH10B2///RSPH10B","PABPC1L2B///PABPC1L2A","LOC100506639///ZNF131"),
Gene.ID = c("728194///222967","645974///340529","100506639///7690"))
df
FDR Probe_ID Gene.Symbol Gene.ID
1 0.009 1555272_at RSPH10B2///RSPH10B 728194///222967
2 0.007 1557203_at PABPC1L2B///PABPC1L2A 645974///340529
3 0.007 1557384_at LOC100506639///ZNF131 100506639///7690
Run Code Online (Sandbox Code Playgroud)
我想根据 列 的行值和df$Gene.symbol模式分割数据框///。结果数据框应如下所示 -
FDR Probe_ID Gene.symbol Gene.ID
0.009 15111_at RSPH10B2 728194
0.009 15111_at RSPH10B 222967
0.007 15222_at PABPC1L2B 645974
0.007 15222_at PABPC1L2A 340529
0.007 15333_at LOC100506639 100506639
0.007 15333_at ZNF131 7690
Run Code Online (Sandbox Code Playgroud)
我尝试了以下代码,但它不起作用并生成了具有重复元素的列- …
在这种数据框中:
df <- data.frame(
w1 = c("A","A","B","C","A"),
w2 = c("C","A","A","C","C"),
w3 = c("C","A","B","C","B")
)
Run Code Online (Sandbox Code Playgroud)
我需要计算所有列中字符值的列内比例。有趣的是,以下代码适用于大型实际数据集,但对上述玩具数据会引发错误:
df %>%
summarise(across(everything(), ~prop.table(table(.))*100))
Run Code Online (Sandbox Code Playgroud)
我正在寻找的是一个数据框,其中每列中所有值的精确比例加上一列指示值:
w1 w2 w3
1 A 60 40 20
2 B 20 0 40
3 C 20 60 40
Run Code Online (Sandbox Code Playgroud) 我有一系列 s 中Subjects提到的评级的此类数据:AnnotationTrial
df <- structure(list(Subject = c("A", "A", "A", "B", "B", "B"), \n Annotation = c("f", "n", "n", "f", "n", "f"), \n Trial = c(1L, 2L, 3L, 1L, 2L, 3L),\n ID = c(1L, 2L, 3L, 1L, 2L, 3L),\n Trial_time = c("00:00:00.001", \n "00:00:00.002", "00:00:00.003", "00:00:00.001", \n "00:00:00.002", "00:00:00.003")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))\nRun Code Online (Sandbox Code Playgroud)\n我想pivot_wider同时Trial保留每个 的所有Subject关联列。我所能做的就是这样,它会丢失列ID和Trial_time:
library(dplyr)\ndf %>%\n …Run Code Online (Sandbox Code Playgroud) 我有这种类型的数据:
df <- data.frame(
Partcpt = c("B","A","B","C"),
aoi = c("ACA","CB","AA","AABC" )
)
Run Code Online (Sandbox Code Playgroud)
我想用aoi连续的数字替换单个字母,除非字母重复,在这种情况下,应该重复之前的替换数字。有正则表达式解决这个问题吗?我也愿意接受其他解决方案。
期望的输出是这样的:
Partcpt aoi
1 B 121
2 A 12
3 B 11
4 C 1123
Run Code Online (Sandbox Code Playgroud) 如何将字符串向量更改为通过拆分字符串导出的子字符串
一个示例向量:
test <- c("1.folder/file1.csv","1.folder/file2.csv","1.folder/file3.csv")
Run Code Online (Sandbox Code Playgroud)
期望的输出:
"file1.csv","file2.csv","file3.csv"
Run Code Online (Sandbox Code Playgroud) 我将几个演讲者之间的对话记录为一个字符串:
convers <- "Peter: hiya Mary: hi how wz your weekend Peter: ahh still got a headache An you party a lot Mary: nuh you know my kid s sick n stuff Peter: yeah i know thats erm al hamshi: hey guys how s it goin Peter: Great Mary: where ve you been last week al hamshi: ah you know camping with my girl friend"
Run Code Online (Sandbox Code Playgroud)
我还有一个演讲者姓名的向量:
speakers <- c("Peter", "Mary", "al hamshi")
Run Code Online (Sandbox Code Playgroud)
使用这个向量作为我的正则表达式模式的一个组成部分,我在这个提取方面做得比较好:
library(stringr)
str_extract_all(convers,
paste("(?<=: )[\\w\\s]+(?= ", paste0(".*\\b(", paste(speakers, …Run Code Online (Sandbox Code Playgroud) 我有这种类型的数据:
\ndf <- structure(list(Utterance = c("(5.127)", ">like I don't understand< sorry like how old's your mom\xc2\xbf", \n "(0.855)", "eh six:ty:::-one=", "(0.101)", "(0.487)", "[((v: gasps)) she said] ~no you're [not?]~", \n "[((v: gasps)) she said] ~no you're [not?]~", "~<[NO YOU'RE] NOT (.) you can't go !in!>~", \n "(0.260)", "show her [your boobs] next time"), \n Q = c(NA, "q_wh", "", "", NA, NA, "q_really", "", "", NA, NA), \n Sequ = c(NA, 1L, 1L, 1L, NA, NA, 0L, 0L, 0L, …Run Code Online (Sandbox Code Playgroud)