如何在 Ruby 中为正则表达式字符串匹配生成百分比？

Question

如何在 Ruby 中为正则表达式字符串匹配生成百分比？

Nic*_*k D 1 ruby regex arrays activerecord

我正在尝试构建一个简单的方法来查看数据库中大约 100 个条目的姓氏，并提取出所有匹配超过特定字母百分比的条目。我目前的做法是：

从数据库中提取所有 100 个条目到一个数组中
在执行以下操作时遍历它们
将姓氏拆分为字母数组
从另一个包含我尝试匹配的名称字母的数组中减去该数组，只留下不匹配的字母。
取结果的大小并除以步骤 3 中数组的原始大小，得到一个百分比。
如果百分比高于预定义的阈值，则将该数据库对象推送到结果数组中。

这有效，但我觉得必须有一些很酷的 ruby/regex/active record 方法可以更有效地执行此操作。我用谷歌搜索了很多，但找不到任何东西。

Answer 1

Car*_*and 5

要对您建议的措施的优点发表评论，需要进行推测，这在 SO 上是超出范围的。因此，我将仅演示您可以如何实施您提出的方法。

代码

首先定义一个辅助方法：

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

Run Code Online (Sandbox Code Playgroud)

简而言之，如果

a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]

Run Code Online (Sandbox Code Playgroud)

然后

a - b           #=> [1]

Run Code Online (Sandbox Code Playgroud)

然而

a.difference(b) #=> [1, 3, 2, 2]

Run Code Online (Sandbox Code Playgroud)

这种方法在我对这个 SO question 的回答中有详细说明。我发现它有很多用途，因此我建议将它添加到 Ruby Core 中。

以下方法生成一个哈希，其键是names（字符串）的元素，其值是target字符串中包含在每个字符串中的字母的分数names。

def target_fractions(names, target)
  target_arr = target.downcase.scan(/[a-z]/)
  target_size = target_arr.size
  names.each_with_object({}) do |s,h|
    s_arr = s.downcase.scan(/[a-z]/)
    target_remaining = target_arr.difference(s_arr)
    h[s] = (target_size-target_remaining.size)/target_size.to_f
  end
end

Run Code Online (Sandbox Code Playgroud)

例子

target = "Jimmy S. Bond"

Run Code Online (Sandbox Code Playgroud)

你比较的名字是由

names = ["Jill Dandy", "Boomer Asad", "Josefine Simbad"]

Run Code Online (Sandbox Code Playgroud)

然后

target_fractions(names, target)
  #=> {"Jill Dandy"=>0.5, "Boomer Asad"=>0.5, "Josefine Simbad"=>0.8}

Run Code Online (Sandbox Code Playgroud)

解释

对于上述names和的值target，

target_arr = target.downcase.scan(/[a-z]/)
  #=> ["j", "i", "m", "m", "y", "s", "b", "o", "n", "d"] 
target_size = target_arr.size
  #=> 10

Run Code Online (Sandbox Code Playgroud)

现在考虑

s = "Jill Dandy"
h = {}

Run Code Online (Sandbox Code Playgroud)

然后

s_arr = s.downcase.scan(/[a-z]/)
  #=> ["j", "i", "l", "l", "d", "a", "n", "d", "y"]
target_remaining = target_arr.difference(s_arr)
  #=> ["m", "m", "s", "b", "o"]

h[s] = (target_size-target_remaining.size)/target_size.to_f
  #=> (10-5)/10.0 => 0.5
h #=> {"Jill Dandy"=>0.5}

Run Code Online (Sandbox Code Playgroud)

Boomer 和 Josefine 的计算类似。

归档时间：	9 年，3 月前
查看次数：	264 次
最近记录：	9 年，3 月前