以下链接展示了如何在实体跨越多个令牌的情况下添加自定义实体规则。执行此操作的代码如下:
import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', parse=True, tag=True, entity=True)
animal = ["cat", "dog", "artic fox"]
ruler = EntityRuler(nlp)
for a in animal:
ruler.add_patterns([{"label": "animal", "pattern": a}])
nlp.add_pipe(ruler)
doc = nlp("There is no cat in the house and no artic fox in the basement")
with doc.retokenize() as retokenizer:
for ent in doc.ents:
retokenizer.merge(doc[ent.start:ent.end])
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern =[{'lower': 'no'},{'ENT_TYPE': {'REGEX': 'animal', 'OP': '+'}}]
matcher.add('negated animal', None, pattern)
matches = matcher(doc)
for …Run Code Online (Sandbox Code Playgroud) 如何使 spaCy 不区分大小写?
是否有任何我应该添加的代码片段或其他东西,因为我无法获取非大写的实体?
import spacy
import pandas as pd
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
ruler = nlp.add_pipe("entity_ruler")
flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
ruler.add_patterns([{"label": "animal", "pattern": a}])
result={}
doc = nlp("CAT and Artic fox, plant african daisy")
for ent in doc.ents:
result[ent.label_]=ent.text
df = pd.DataFrame([result])
print(df)
Run Code Online (Sandbox Code Playgroud) 我正在尝试使用 pandas 按索引从数据框中选择前 2 列和最后 2 列,并将其保存在同一数据框中。
有没有一种方法可以一步完成?
我有一个看起来像这样的数据框:
Name rent sale
0 A 180 2
1 B 1 4
2 M 12 1
3 O 10 1
4 A 180 5
5 M 2 19
Run Code Online (Sandbox Code Playgroud)
我想提出条件,如果我在列字段中有重复的行和重复的值=> 示例:
预期输出:
Name rent sale
0 A 180 7
1 B 1 4
2 M 14 20
3 …Run Code Online (Sandbox Code Playgroud)