我正在尝试这段代码
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
train_data = ["football is the sport","gravity is the movie", "education is imporatant"]
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.5,
stop_words='english')
print "Applying first train data"
X_train = vectorizer.fit_transform(train_data)
print vectorizer.get_feature_names()
print "\n\nApplying second train data"
train_data = ["cricket", "Transformers is a film","AIMS is a college"]
X_train = vectorizer.transform(train_data)
print vectorizer.get_feature_names()
print "\n\nApplying fit transform onto second train data"
X_train = vectorizer.fit_transform(train_data)
print vectorizer.get_feature_names()
Run Code Online (Sandbox Code Playgroud)
这个的输出是
Applying first train data
[u'education', u'football', u'gravity', u'imporatant', u'movie', u'sport'] …Run Code Online (Sandbox Code Playgroud) 我试图获得一个英语单词的基本英语单词,该单词是从其基本形式修改的.这个问题已在这里提出,但我没有看到正确的答案,所以我试图这样说.我尝试了两个来自NLTK包的词干器和一个词形变换器,它们是搬运器,干扰器,雪球器和wordnet lemmatiser.
我试过这段代码:
from nltk.stem.porter import PorterStemmer
from nltk.stem.snowball import SnowballStemmer
from nltk.stem.wordnet import WordNetLemmatizer
words = ['arrival','conclusion','ate']
for word in words:
print "\n\nOriginal Word =>", word
print "porter stemmer=>", PorterStemmer().stem(word)
snowball_stemmer = SnowballStemmer("english")
print "snowball stemmer=>", snowball_stemmer.stem(word)
print "WordNet Lemmatizer=>", WordNetLemmatizer().lemmatize(word)
Run Code Online (Sandbox Code Playgroud)
这是我得到的输出:
Original Word => arrival
porter stemmer=> arriv
snowball stemmer=> arriv
WordNet Lemmatizer=> arrival
Original Word => conclusion
porter stemmer=> conclus
snowball stemmer=> conclus
WordNet Lemmatizer=> conclusion
Original Word => ate
porter stemmer=> ate
snowball stemmer=> ate
WordNet …Run Code Online (Sandbox Code Playgroud) 我正在经历Prolog.我想用它来进行自然语言处理.我在IBM Watson系统中使用Prolog进行了本文的自然语言处理.正如文中所述,我想以类似的方式尝试一下.现在我想知道要使用哪个Prolog实现.我在Priki上看到了所有这些比较到维基上的内容.那么这些实现中的哪一个可以用于在Ubunutu上使用NLP的目的.也是一个很容易与python集成并且速度很快的那个.有没有人曾经做过这些实现.SWI-Prolog好吗?
感谢帮助.谢谢:)