rwr*_*wer 0 r pattern-matching match
我有两个数据集:
data1 就好像
id name
1 1 toyota
2 2 walmart
3 3 fox ad company
Run Code Online (Sandbox Code Playgroud)
data2 就好像
id name
1 1 sales walmart
2 2 fox advertisement company
3 3 metro toyota
Run Code Online (Sandbox Code Playgroud)
在这个实例中考虑我们期望在data2的名称中找到data1的所有名称.
怎么做这个比赛?如果我们在data1和data2之间找到匹配项,我们将打印data1的id.
例如:
id name data2
1 1 toyota 3
2 2 walmart 1
3 3 fox ad company 2
Run Code Online (Sandbox Code Playgroud)
假设你有:
one <- c("toyota","walmart","fox ad company")
two <- c("sales walmart","fox advertisement company","metro toyota")
Run Code Online (Sandbox Code Playgroud)
您可以使用最小字符串距离提取匹配项adist.这可能是错误的,但它会给你一个开始.请参阅?adist如何编辑此内容,仅查看字符的添加,替换或插入.
max.col(-adist(one,two))
#[1] 3 1 2
Run Code Online (Sandbox Code Playgroud)
匹配好了:
data.frame(one, two=two[max.col(-adist(one,two))])
# one two
#1 toyota metro toyota
#2 walmart sales walmart
#3 fox ad company fox advertisement company
Run Code Online (Sandbox Code Playgroud)