我正在尝试从数据集上的 textacy 实现“extract.subject_verb_object_triples”函数。然而,我编写的代码非常慢并且占用大量内存。有没有更高效的实现方式?
import spacy
import textacy
def extract_SVO(text):
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
tuples = textacy.extract.subject_verb_object_triples(doc)
tuples_to_list = list(tuples)
if tuples_to_list != []:
tuples_list.append(tuples_to_list)
tuples_list = []
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Run Code Online (Sandbox Code Playgroud)
date_publish \
0 2013-05-14 17:17:05
1 2014-05-09 20:15:57
4 2018-07-19 10:29:54
6 2012-04-17 21:02:54
8 2012-12-12 20:17:56
9 2018-11-08 10:51:49
11 2013-08-25 07:13:31
12 2015-01-09 00:54:17
title
0 Italy will not dismantle Montis labour reform minister
1 Exclusive US agency FinCEN rejected veterans in bid to hire …Run Code Online (Sandbox Code Playgroud) 我有一个大约 20k 单词向量('tuple_vectors')的列表,没有标签,每个向量如下所示
[-2.84658718e+00 -7.74899840e-01 -2.24296474e+00 -8.69364500e-01
3.90927410e+00 -2.65316987e+00 -9.71897244e-01 -2.40408254e+00
1.16272974e+00 -2.61649752e+00 -2.87350488e+00 -1.06603658e+00
2.93374014e+00 1.07194626e+00 -1.86619771e+00 1.88549474e-01
-1.31901133e+00 3.83382154e+00 -3.46174908e+00 ...
Run Code Online (Sandbox Code Playgroud)
有没有一种快速、简洁的方法来使用 t-sne 进行可视化?
我尝试过以下方法
from sklearn.manifold import TSNE
n_sne = 21060
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(tuple_vectors)
plt(tsne_results)
Run Code Online (Sandbox Code Playgroud)