我有讲故事的笔录,其中有许多重叠的语音实例,用方括号将重叠的语音括起来。我想提取这些重叠的实例。在下面的模拟示例中,
\n\novl <- c("well [yes right]", "let\'s go", "oh [ we::ll] i do n\'t (0.5) know", "erm [\xc2\xb0well right\xc2\xb0 ]", "(3.2)")\nRun Code Online (Sandbox Code Playgroud)\n\n这段代码工作正常:
\n\npattern <- "\\\\[(.*\\\\w.+])*"\ngrep(pattern, ovl, value=T) \nmatches <- gregexpr(pattern, ovl) \noverlap <- regmatches(ovl, matches)\noverlap_clean <- unlist(overlap); overlap_clean\n[1] "[yes right]" "[ we::ll]" "[\xc2\xb0well right\xc2\xb0 ]"\nRun Code Online (Sandbox Code Playgroud)\n\n但在较大的文件(数据帧)中,则不然。这是由于模式错误还是由于数据帧的结构所致?df 的前六行如下所示:
\n\n> head(df)\n Story\n1 "Kar:\\tMind you our Colin\'s getting more like your dad every day\n2 June:\\tI know he is.\n3 Kar:\\tblack welding glasses on, \n4 \\tand he turned round and he made me jump\n5 \\t\xe2\x80\x9cO:h, Colin\xe2\x80\x9d, \n6 \\tand then ( )\nRun Code Online (Sandbox Code Playgroud)\n
虽然它在某些情况下可能有效,但你的模式对我来说看起来不合适。我想应该是这样的:
\n\npattern <- "(\\\\[.*?\\\\])"\nmatches <- gregexpr(pattern, ovl)\noverlap <- regmatches(ovl, matches)\noverlap_clean <- unlist(overlap)\noverlap_clean\n\n[1] "[yes right]" "[ we::ll]" "[\xc2\xb0well right\xc2\xb0 ]"\nRun Code Online (Sandbox Code Playgroud)\n\n这将匹配并捕获括号内的术语,使用 Perl 懒点来确保我们停在第一个右括号处。
\n| 归档时间: |
|
| 查看次数: |
3809 次 |
| 最近记录: |