小编W.R*_*W.R的帖子

更有效地实现 Textacy / spacy 'subject_verb_object_triples'

我正在尝试从数据集上的 textacy 实现“extract.subject_verb_object_triples”函数。然而,我编写的代码非常慢并且占用大量内存。有没有更高效的实现方式?

import spacy
import textacy

def extract_SVO(text):

    nlp = spacy.load('en_core_web_sm')
    doc = nlp(text)
    tuples = textacy.extract.subject_verb_object_triples(doc)
    tuples_to_list = list(tuples)
    if tuples_to_list != []:
        tuples_list.append(tuples_to_list)

tuples_list = []          
sp500news['title'].apply(extract_SVO)
print(tuples_list)
Run Code Online (Sandbox Code Playgroud)

样本数据 (sp500news)

    date_publish  \
0       2013-05-14 17:17:05   
1       2014-05-09 20:15:57   
4       2018-07-19 10:29:54   
6       2012-04-17 21:02:54   
8       2012-12-12 20:17:56   
9       2018-11-08 10:51:49   
11      2013-08-25 07:13:31   
12      2015-01-09 00:54:17   

 title  
0       Italy will not dismantle Montis labour reform  minister                            
1       Exclusive US agency FinCEN rejected veterans in bid to hire …
Run Code Online (Sandbox Code Playgroud)

python nlp pandas spacy textacy

3
推荐指数
1
解决办法
2345
查看次数

词向量列表上的 T-SNE 可视化

我有一个大约 20k 单词向量('tuple_vectors')的列表,没有标签,每个向量如下所示

[-2.84658718e+00 -7.74899840e-01 -2.24296474e+00 -8.69364500e-01
  3.90927410e+00 -2.65316987e+00 -9.71897244e-01 -2.40408254e+00
  1.16272974e+00 -2.61649752e+00 -2.87350488e+00 -1.06603658e+00
  2.93374014e+00  1.07194626e+00 -1.86619771e+00  1.88549474e-01
 -1.31901133e+00  3.83382154e+00 -3.46174908e+00 ...
Run Code Online (Sandbox Code Playgroud)

有没有一种快速、简洁的方法来使用 t-sne 进行可视化?

我尝试过以下方法

from sklearn.manifold import TSNE

n_sne = 21060


tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(tuple_vectors)
plt(tsne_results)
Run Code Online (Sandbox Code Playgroud)

python nlp data-visualization scikit-learn word-embedding

3
推荐指数
1
解决办法
1669
查看次数