我有话语的音标:
str <- c("a? n?? ?ts ?ts ð? s?ks? ?v ?u?n",
"w?l ð? ?æp n?kst d??z ?fa?nd?? ?t ?v?ri ??mju?z??",
"l?vli bu(?)?ke? ?v ?fla??z f? mi w?l ðæts ?t",
"ðe? ra?t l?? ?n ð? li?g ??nt ðe?",
"k?? wi ???t wi w??t wi?d l?ft ???l?? na?",
"a? n?? s ð? bi? ð? b?g b?? ðe?l",
"je? b?t ?t s ? m??l a? k?n ????? ju?",
"?? ??st e? ha? a? ju?zd t? d? j??z ??g??",
"je? d??nt ?w?ri ??ba?t mi …Run Code Online (Sandbox Code Playgroud) 我正在geom_smooth按以下分组的方面绘制 s size:
library(ggplot2)
ggplot(df,
aes(x = pos, y = mean_ratio_f ))+
geom_smooth(aes(group = factor(size)), method = "lm", se = FALSE, linewidth = 0.5) +
# facets:
facet_wrap(. ~ size, scales = 'free_x')+
labs(x ="X",
y = "Y")
Run Code Online (Sandbox Code Playgroud)
不幸的是,最后三个面(对于size第 23、24 和 25 组)与左边距对齐,因此它们的右侧有一个间隙(这也造成了整个图向右倾斜的印象!):
在我看来,这个问题可以通过集中讨论的三个方面来解决(但也许还有其他解决方案)。如何重新排列事实以使最后三个方面居中?
数据:
df <- structure(list(size = c(3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L,
8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 13L, 13L, 14L,
14L, …Run Code Online (Sandbox Code Playgroud) 给定此类数据:
df <- data.frame(
ID = 1:10,
Sequ = c(NA, 44,44, NA, NA, 33,33,33, 5,5),
Q = c(NA, "q1","q1", NA, NA, "q2","q2","q2", "q2","q2")
)
Run Code Online (Sandbox Code Playgroud)
如何比这样做更有效地更新游程 ID :Sequ
library(dplyr)
library(data.table)
left_join(df, df %>%
filter(!is.na(Sequ)) %>%
mutate(Sequ_0 = rleid(Sequ))) %>%
select(-Sequ)
ID Q Sequ_0
1 1 <NA> NA
2 2 q1 1
3 3 q1 1
4 4 <NA> NA
5 5 <NA> NA
6 6 q2 2
7 7 q2 2
8 8 q2 2
9 9 q2 …Run Code Online (Sandbox Code Playgroud) 我有一些列的数据,例如Area_bsl包含逗号分隔值字符串的列,以及一个列,其中规定必须缩短的diffr元素数量:Area_bsl
df <- data.frame(
id = 1:3,
Area_bsl = c("155,199,198,195,100,112,177,199,188,144",
"100,99,98,95,100,112,111,99",
"131,166,155,111,100,117,166,188,101,101,105,166"),
diffr = c(3,0,6)
)
Run Code Online (Sandbox Code Playgroud)
所以我要做的就是切断...
Area_bsl和中的最后 3 个元素id == 1Area_bsl和中有 0 个元素id == 2Area_bsl和中的最后 6 个元素id == 3我一直是这样处理这个任务的;最后一部分使用slice_head会引发错误:
library(tidyverse)
df %>%
# separate comma-separated values into rows:
separate_rows(Area_bsl) %>%
# for each `id`...:
group_by(id) %>%
#... create a row counter:
mutate(rowid = row_number()) %>%
# ...create the cutoff point:
mutate(cutoff = …Run Code Online (Sandbox Code Playgroud) I have a dataframe like this:
set.seed(12)
df <- data.frame(
v1 = sample(LETTERS, 10),
v2 = sample(LETTERS, 10),
v3 = sample(LETTERS, 10),
v4 = c(sample(LETTERS, 8), sample(letters, 2)),
v5 = c(sample(letters, 1), sample(LETTERS, 7), sample(letters, 2))
)
df
v1 v2 v3 v4 v5
1 B K F G p
2 U U T W N
3 W J C V Y
4 G I Q S E
5 D F E N T
6 A X Z T C
7 …Run Code Online (Sandbox Code Playgroud) 我试图在 R 中提出一个正则表达式来匹配重复两个不同字符的字符串。
x <- c("aaaaaaah" ,"aaaah","ahhhh","cooee","helloee","mmmm","noooo","ohhhh","oooaaah","ooooh","sshh","ummmmm","vroomm","whoopee","yippee")
Run Code Online (Sandbox Code Playgroud)
此正则表达式匹配以上所有内容,包括诸如“mmmm”和“ohhhh”之类的字符串,其中第一次和第二次重复中的重复字母相同:
grep(".*([a-z])\\1.*([a-z])\\2", x, value = T)
Run Code Online (Sandbox Code Playgroud)
我想匹配的x是这些重复字母不同的字符串:
"cooee","helloee","oooaaah","sshh","vroomm","whoopee","yippee"
Run Code Online (Sandbox Code Playgroud)
如何调整正则表达式以确保第二个重复字符与第一个不同?
我有一个很长的字符串,我想将其分成固定的间隔,例如,每个间隔 10 个单词:
x <- "Hrothgar, king of the Danes, or Scyldings, builds a great mead-hall, or palace, in which he hopes to feast his liegemen and to give them presents. The joy of king and retainers is, however, of short duration. Grendel, the monster, is seized with hateful jealousy. He cannot brook the sounds of joyance that reach him down in his fen-dwelling near the hall. Oft and anon he goes to the joyous building, bent on direful mischief. …Run Code Online (Sandbox Code Playgroud) 我有这个字符向量:
dput(t$line)
c("0304", "0305", "0306", "0308", "0311", "0313", "0314", "0316",
"0318", "0321", "0322", "0323", "0324", "0326", "0327", "0330",
"0333", "0337", "0338", "0339", "0342", "0341", "0344", "0346",
"0347", "0348", "0349", "0350", "0352", "0353", "0357", "0359",
"0360", "0362", "0363", "0364", "0365", "0367", "0371", "0370",
"0373", "0375", "0378", "0380", "0381", "0385", "0386", "0387",
"0391", "0395", "0394", "0397", "0398", "0399", "0400", "0402",
"0404", "0405", "0406", "0408", "0412", "0416", "0419", "0423",
"0424", "0425", "0426", "0428", "0429", "0432", "0433", "0436",
"0435", "0439", "0437", "0440", "0441")
Run Code Online (Sandbox Code Playgroud)
它包含的数字不是完全连续的。我想让它们连续,同时在需要时保留前导零。我想出了这个解决方案: …
我有这样的文本字符串:
u <- "she goes ~Wha::?~ and he's like ~?Yeah believe me!~ and she's etc."
Run Code Online (Sandbox Code Playgroud)
我想要做的是将成对~分隔符(包括分隔符本身)之间出现的所有字符替换为X.
此gsub方法用~单个替换-delimitor 对之间的子字符串X:
gsub("~[^~]+~", "X", u)
[1] "she goes X and he's like X and she's etc."
Run Code Online (Sandbox Code Playgroud)
但是,我真正想做的是将分隔符(和分隔符本身)之间的每个字符替换为X. 所需的输出是这样的:
"she goes XXXXXXXXX and he's like XXXXXXXXXXXXXXXXXXX and she's etc."
Run Code Online (Sandbox Code Playgroud)
我一直在试验nchar,反向引用,paste如下,但结果不正确:
gsub("(~[^~]+~)", paste0("X{", nchar("\\1"),"}"), u)
[1] "she goes X{2} and he's like X{2} and she's etc."
Run Code Online (Sandbox Code Playgroud)
任何帮助表示赞赏。
我有讲故事的笔录,其中有许多重叠的语音实例,用方括号将重叠的语音括起来。我想提取这些重叠的实例。在下面的模拟示例中,
\n\novl <- c("well [yes right]", "let\'s go", "oh [ we::ll] i do n\'t (0.5) know", "erm [\xc2\xb0well right\xc2\xb0 ]", "(3.2)")\nRun Code Online (Sandbox Code Playgroud)\n\n这段代码工作正常:
\n\npattern <- "\\\\[(.*\\\\w.+])*"\ngrep(pattern, ovl, value=T) \nmatches <- gregexpr(pattern, ovl) \noverlap <- regmatches(ovl, matches)\noverlap_clean <- unlist(overlap); overlap_clean\n[1] "[yes right]" "[ we::ll]" "[\xc2\xb0well right\xc2\xb0 ]"\nRun Code Online (Sandbox Code Playgroud)\n\n但在较大的文件(数据帧)中,则不然。这是由于模式错误还是由于数据帧的结构所致?df 的前六行如下所示:
\n\n> head(df)\n Story\n1 "Kar:\\tMind you our Colin\'s getting more like your dad every day\n2 June:\\tI know he is.\n3 Kar:\\tblack welding glasses on, \n4 \\tand he turned round and …Run Code Online (Sandbox Code Playgroud)