我正在尝试使用 python 中的 nlp 或 scapy 库从 txt 文件中提取位置名称、国家/地区名称、城市名称、旅游地点。
我已经尝试过以下:
import spacy
en = spacy.load('en')
sents = en(open('subtitle.txt').read())
place = [ee for ee in sents.ents]
Run Code Online (Sandbox Code Playgroud)
获取输出:
[1,
, three, London,
,
,
,
, first,
,
, 00:00:20,520,
,
, London, the
4
00:00:20,520, 00:00:26,130
, Buckingham Palace,
,
Run Code Online (Sandbox Code Playgroud)
我只想要位置名称、国家/地区名称、城市名称和城市内的任何地点。
我也尝试过使用 NLP:
import nltk
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')
with open('subtitle.txt', 'r') as f:
sample = f.read()
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) …Run Code Online (Sandbox Code Playgroud)