如何从Oracle表中获取几乎匹配的字符串?

sha*_*ath 4 sql oracle string-matching utl-match

我在Oracle中有一个包含四列的表。 Oracle中的表数据。

现在,用户可以向我的查询输入输入字符串作为“向右膝盖操作”(有效),并且我的查询应返回与DiagnosisName列中的大多数单词匹配的ICD代码(IKR123)。

以下是我当前的查询。(未提供正确的输出)

SELECT diagnosisname
FROM
  (SELECT diagnosisname,
    UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname)
  FROM icd_code
  ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
  )
WHERE ROWNUM<2;
Run Code Online (Sandbox Code Playgroud)

该查询给我的输出为“左膝盖操作”,但我的期望是“右膝盖操作”。

Lal*_*r B 5

关于您使用UTL_MATCH的注意事项

  • EDIT_DISTANCE_SIMILARITY:返回0到100之间的整数,其中0表示完全没有相似性,而100表示​​完美匹配。
  • JARO_WINKLER_SIMILARITY:返回0到100之间的整数,其中0表示完全没有相似性,而100表示​​完美匹配,但尝试考虑可能的数据输入错误。

ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname)DESC

这不会给您正确的结果。因为,您仅考虑可能的相似性,但是没有考虑数据输入错误。因此,您必须使用JARO_WINKLER_SIMILARITY

右膝手术

您需要牢记输入的CASE和要比较的列值。它们必须在类似情况下才能正确匹配。您正在LOWERCASE中传递输入,但是,列值在INITCAP中。最好将列值和输入都转换为类似的情况。

让我们看下面的演示以了解:

SQL> WITH DATA AS(
  2  SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
  3  SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
  4  SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
  5  SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
  6  SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
  7  )
  8  SELECT t.*,
  9    utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
 10    UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
 11  FROM DATA t
 12  ORDER BY jws DESC
 13  /

DIAGNOSIS_NAME       ICD_CO        EDS        JWS
-------------------- ------ ---------- ----------
Right Knee Operation IKR123         20         72
Knee Operation       IK123          20         70
Heart Operation      IH123          25         68
Left Knee Operation  IKL123         25         64
Fever                IF123          15         47

SQL>
Run Code Online (Sandbox Code Playgroud)

因此,您将看到两者之间的差异。jaro_winkler_similarity在识别数据输入错误并提供最接近的匹配方面做得更好。基于此,只需按降序对第一行进行排序即可:

SQL> WITH DATA AS(
  2  SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
  3  SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
  4  SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
  5  SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
  6  SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
  7  )
  8  SELECT diagnosis_name
  9  FROM
 10    (SELECT t.*,
 11      utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
 12      UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
 13    FROM DATA t
 14    ORDER BY jws DESC
 15    )
 16  WHERE rownum = 1
 17  /

DIAGNOSIS_NAME
--------------------
Right Knee Operation

SQL>
Run Code Online (Sandbox Code Playgroud)