相关疑难解决方法(0)

Python在句子上分割文本

我有一个文本文件.我需要一个句子列表.

如何实施？有许多细微之处,例如在缩写中使用点.

我的旧正则表达式很糟糕.

re.compile('(\. |^|!|\?)([A-Z][^;?\.<>@\^&/\[\]]*(\.|!|\?) )',re.M)

Run Code Online (Sandbox Code Playgroud)

python text split

Art*_*yom

2011 01-02

85
推荐指数

9
解决办法

11万
查看次数

在特定文件上测试NLTK分类器

以下代码运行Naive Bayes电影评论分类器.该代码生成一个信息最丰富的功能列表.

注意: **movie review**文件夹在nltk.

from itertools import chain
from nltk.corpus import stopwords
from nltk.probability import FreqDist
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
stop = stopwords.words('english')

documents = [([w for w in movie_reviews.words(i) if w.lower() not in stop and w.lower() not in string.punctuation], i.split('/')[0]) for i in movie_reviews.fileids()]


word_features = FreqDist(chain(*[i for i,j in documents]))
word_features = word_features.keys()[:100]

numtrain = int(len(documents) * 90 / 100)
train_set = [({i:(i in tokens) for i in …

Run Code Online (Sandbox Code Playgroud)

nlp classification nltk python-2.7 text-classification

ZaM*_*ZaM

2017 05-23

8
推荐指数

1
解决办法

2396
查看次数

NLTK(python)中的语料库和词典有什么区别

有人能告诉我NLTK中语料库,语料库和词典之间的区别吗？

什么是电影数据集？

什么是Wordnet？

nlp machine-learning corpus nltk lexical

Kum*_*mar

2015 07-21

4
推荐指数

1
解决办法

7721
查看次数

如何在scikit-learn中使用散列技巧来渲染双字母？

我有一些大事,让我们说:[('word','word'),('word','word'),...,('word','word')].我如何使用scikit HashingVectorizer创建一个特征向量,随后将呈现给某些分类算法,例如SVC或Naive Bayes或任何类型的分类算法？

python nlp machine-learning scipy scikit-learn

tum*_*eed

lucky-day

3
推荐指数

1
解决办法

3801
查看次数