NLTagger 将每个单词标记为 OtherWord,并将命名方案标记为 Other

Cod*_*rew 5 nlp ios swift

我尝试了苹果自己的例子

import NaturalLanguage

let text = "The American Red Cross was established in Washington, D.C., by Clara Barton."

let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]

tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in 
    // Get the most likely tag, and print it if it's a named entity.
    if let tag = tag, tags.contains(tag) {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }
        
    // Get multiple possible tags with their associated confidence scores.
    let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
    print(hypotheses)
        
   return true
}
Run Code Online (Sandbox Code Playgroud)

但它将所有名称标签返回为Other. 我还尝试了另一个使用词汇类别标记句子的示例,它还将每个单词标记为OtherWord

var text = "The American Red Cross was established in Washington, D.C., by Clara Barton."

let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]

print("language", tagger.dominantLanguage)

tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
    // Get the most likely tag, and print it if it's a named entity.
    if let tag = tag {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }

   return true
}
Run Code Online (Sandbox Code Playgroud)

我通过设置语言正字法尝试回答这个问题,但没有帮助:

//tagger.setOrthography(NSOrthography(dominantScript: "Latn", languageMap: ["Latn": ["en"]]), range: text.startIndex..<text.endIndex)
tagger.setOrthography(NSOrthography.defaultOrthography(forLanguage: "en-US"), range: text.startIndex..<text.endIndex)
Run Code Online (Sandbox Code Playgroud)

有人知道为什么会这样吗?

顺便说一句,我的 Xcode 版本是今天最新的版本,14.3。

Cod*_*rew 2

这似乎是 Xcode 14.3 的回归。我下载了 Xcode 14.2,NLTagger 可以正确工作.nameType.lexicalClass进行标记。

Xcode 14.3 中的这种回归也会影响NLEmbedding. 例如,以下代码在 14.2 中正确获取单词邻居,但在 Xcode 14.3 中返回 nil 嵌入:

if let embedding = NLEmbedding.wordEmbedding(for: .english) {
  print("found embedding")
  print("embeddings for family: \(embedding.neighbors(for: "family", maximumCount: 3))")
  print("embeddings for science: \(embedding.neighbors(for: "science", maximumCount: 3))")
} else {
  print("no embedding found")
}
Run Code Online (Sandbox Code Playgroud)