如何使用vb.net比较字符串的百分比匹配?

Jad*_*ine 3 vb.net

我现在正在尝试不同的技术,将头撞在墙上一段时间.

他们都没有运作良好.

我有两个字符串.

我需要比较它们并获得确切的匹配百分比,

即."四分和七年前"为了"scor and sevn yeres ago"

好吧,我首先将每个单词与每个单词进行比较,跟踪每个单词,以及百分比= count \numOfWords.不,没有考虑拼写错误的单词.

("四"<>"为"即使它很近)

然后我开始尝试比较每个char中的每个char,如果不匹配则递增字符串char(计算拼写错误).但是,我会得到错误的命中,因为第一个字符串可能在第二个字符串中包含每个字符,但不是第二个字符串的确切顺序.("东西可用"<>"stu vail"(但它会回来,低百分比,但是命中.9\11 = 81%))

所以,我然后尝试比较每个字符串中的字符对.如果string1 [i] = string2 [k] AND string1 [i + 1] = string2 [k + 1],则递增计数,并在不匹配时递增"k"(以跟踪误导."for"和"四分"应该以75%的命中率回归.)这似乎也不起作用.它越来越近,但即使完全匹配,它也只会返回94%.当事情真的拼写错误时,它真的搞砸了.(底部代码)

有什么想法或指示吗?

count = 0
j = 0
k = 0
While j < strTempName.Length - 2 And k < strTempFile.Length - 2
    ' To ignore non letters or digits '
    If Not strTempName(j).IsLetter(strTempName(j)) Then
        j += 1
    End If

    ' To ignore non letters or digits '
    If Not strTempFile(k).IsLetter(strTempFile(k)) Then
        k += 1
    End If

    ' compare pair of chars '
    While (strTempName(j) <> strTempFile(k) And _ 
           strTempName(j + 1) <> strTempFile(k + 1) And _ 
           k < strTempFile.Length - 2)
        k += 1
    End While
    count += 1
    j += 1
    k += 1

End While

perc = count / (strTempName.Length - 1)
Run Code Online (Sandbox Code Playgroud)

Xav*_*eña 6

编辑:我一直在做一些研究,我想我最初从这里找到了代码,并在几年前将其翻译为vbnet.它使用Levenshtein字符串匹配算法.

这是我用于此的代码,希望它有所帮助:

Sub Main()
    Dim string1 As String = "four score and seven years ago"
    Dim string2 As String = "for scor and sevn yeres ago"
    Dim similarity As Single =
        GetSimilarity(string1, string2)
    ' RESULT : 0.8
End Sub

Public Function GetSimilarity(string1 As String, string2 As String) As Single
    Dim dis As Single = ComputeDistance(string1, string2)
    Dim maxLen As Single = string1.Length
    If maxLen < string2.Length Then
        maxLen = string2.Length
    End If
    If maxLen = 0.0F Then
        Return 1.0F
    Else
        Return 1.0F - dis / maxLen
    End If
End Function

Private Function ComputeDistance(s As String, t As String) As Integer
    Dim n As Integer = s.Length
    Dim m As Integer = t.Length
    Dim distance As Integer(,) = New Integer(n, m) {}
    ' matrix
    Dim cost As Integer = 0
    If n = 0 Then
        Return m
    End If
    If m = 0 Then
        Return n
    End If
    'init1

    Dim i As Integer = 0
    While i <= n
        distance(i, 0) = System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
    End While
    Dim j As Integer = 0
    While j <= m
        distance(0, j) = System.Math.Max(System.Threading.Interlocked.Increment(j), j - 1)
    End While
    'find min distance

    For i = 1 To n
        For j = 1 To m
            cost = (If(t.Substring(j - 1, 1) = s.Substring(i - 1, 1), 0, 1))
            distance(i, j) = Math.Min(distance(i - 1, j) + 1, Math.Min(distance(i, j - 1) + 1, distance(i - 1, j - 1) + cost))
        Next
    Next
    Return distance(n, m)
End Function
Run Code Online (Sandbox Code Playgroud)