use*_*507 1 regex r pattern-matching matching string-matching
我有一个如下的数据集 -
> a_i
[1] "Our-Facebook-Page/td-p/3175990"
[2] "Our-Facebook-Page/td-p/3175990/page/2"
....
[17] "Data-duplicate-files/td-p/4743405"
[18] "Data-duplicate-files/td-p/4743405/page/2"
[19] "Subscription-Release-1-sucks/td-p/4556739"
[20] "Subscription-Release-1-sucks/td-p/4556739/page/2"
> b_i
[1] "Data-duplicate-files/td-p/4743405"
[2] "Subscription-Release-1-sucks/td-p/4556739"
[3] "Quick-fix/td-p/4556740"
Run Code Online (Sandbox Code Playgroud)
我的目标是找到仅存在于b_i中的7位数字(例如4743405,4556739,4556740),并从包含相应数字的a_i中获取数据.所以最终输出将是这样的 -
[1] "Data-duplicate-files/td-p/4743405"
[2] "Data-duplicate-files/td-p/4743405/page/2"
[3] "Subscription-Release-1-sucks/td-p/4556739"
[4] "Subscription-Release-1-sucks/td-p/4556739/page/2""
Run Code Online (Sandbox Code Playgroud)
我能够使用strsplit(b_i,"/")获取数字但我仍然在抓取包含匹配数字的列表.是否有任何优雅的方式来映射这些数字并获取列表?
a_i[grep( paste( gsub("(^.+/)([[:digit:]])(/.+$)", "\\2", b_i),
collapse="|"),
a_i)]
[1] "Data-duplicate-files/td-p/4743405"
[2] "Data-duplicate-files/td-p/4743405/page/2"
[3] "Subscription-Release-1-sucks/td-p/4556739"
[4] "Subscription-Release-1-sucks/td-p/4556739/page/2"
Run Code Online (Sandbox Code Playgroud)
这构造了一串由管道符号分隔的数字字符串,形成一个greppish-OR模式.如果你想强制执行7位数规则,你可以放置一个{}重复量词.目前它会接受正斜杠之间的任意数字位数.