请原谅我,如果这是非常基本的,但我有Python 2.7和Elasticsearch 2.1.1,我只是试图删除索引使用
es.delete(index='researchtest', doc_type='test')
Run Code Online (Sandbox Code Playgroud)
但这给了我
return func(*args, params=params, **kwargs)
TypeError: delete() takes at least 4 arguments (4 given)
Run Code Online (Sandbox Code Playgroud)
我也试过了
es.delete_by_query(index='researchtest', doc_type='test',body='{"query":{"match_all":{}}}')
Run Code Online (Sandbox Code Playgroud)
但我明白了
AttributeError: 'Elasticsearch' object has no attribute 'delete_by_query'
Run Code Online (Sandbox Code Playgroud)
知道为什么吗?对于python,api是否改为2.1.1?
我正在研究关键字提取问题.考虑一般情况
tfidf = TfidfVectorizer(tokenizer=tokenize, stop_words='english')
t = """Two Travellers, walking in the noonday sun, sought the shade of a widespreading tree to rest. As they lay looking up among the pleasant leaves, they saw that it was a Plane Tree.
"How useless is the Plane!" said one of them. "It bears no fruit whatever, and only serves to litter the ground with leaves."
"Ungrateful creatures!" said a voice from the Plane Tree. "You lie here in my cooling shade, and …
Run Code Online (Sandbox Code Playgroud) 我正在尝试使用scikit学习0.17与anaconda 2.7的多标签分类问题.这是我的代码
import pandas as pd
import pickle
import re
from sklearn.cross_validation import train_test_split
from sklearn.metrics.metrics import classification_report, accuracy_score, confusion_matrix
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB as MNB
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
traindf = pickle.load(open("train.pkl","rb"))
X, y = traindf['colC'], traindf['colB'].as_matrix()
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.7)
pip = Pipeline([
('vect', TfidfVectorizer(
analyzer='word',
binary=False,
decode_error='ignore',
dtype=<type 'numpy.int64'>,
encoding=u'utf-8',
input=u'content',
lowercase=True,
max_df=0.25,
max_features=None,
min_df=1,
ngram_range=(1, 1),
norm=u'l2',
preprocessor=None,
smooth_idf=True,
stop_words='english', …
Run Code Online (Sandbox Code Playgroud) python machine-learning scikit-learn multilabel-classification
注意:我使用Python 2.7作为Anaconda发行版的一部分.我希望这不是nltk 3.1的问题.
我正在尝试使用nltk作为NER
import nltk
from nltk.tag.stanford import StanfordNERTagger
#st = StanfordNERTagger('stanford-ner/all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar')
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
print st.tag(str)
Run Code Online (Sandbox Code Playgroud)
但我明白了
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1117)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1076)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1057)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3088)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
Traceback (most recent call last):
File "X:\jnk.py", line 47, in <module>
print st.tag(str)
File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 66, in tag
return sum(self.tag_sents([tokens]), [])
File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 89, in tag_sents
stdout=PIPE, …
Run Code Online (Sandbox Code Playgroud) 我正在使用EmbeddedCassandraServerHelper
单元测试.这是我的pom
<dependencies>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-api</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-utils</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-mock</artifactId>
<version>1.4.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.25</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.jcraft</groupId>
<artifactId>jsch</artifactId>
<version>0.1.54</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>1.10.8</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.4.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>3.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-extras -->
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-extras</artifactId>
<version>3.4.0</version>
</dependency>
<dependency>
<groupId>org.cassandraunit</groupId>
<artifactId>cassandra-unit</artifactId>
<version>3.3.0.2</version>
<scope>test</scope>
</dependency>
</dependencies>
Run Code Online (Sandbox Code Playgroud)
我正在引用文档
https://github.com/jsevellec/cassandra-unit/wiki/How-to-use-it-in-your-code
首先,我试过了
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
private …
Run Code Online (Sandbox Code Playgroud) 我只是想解析一个XML文件:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Run Code Online (Sandbox Code Playgroud)
但这给了我:
import xml.etree.ElementTree as ET
ImportError: No module named 'xml.etree'
Run Code Online (Sandbox Code Playgroud)
我使用的是Python 3.5.我试过用Python 2.7和3.4编写相同的代码,但我总是得到这个错误.我认为XML库是标准的.另外,我可以在我的Lib文件夹中看到:
那为什么不能拿起模块呢?我真的很困惑.我是否必须在某个环境变量中进行一些更改?
请帮忙.
我有一个字段作为
author
Jason Pete
Jason Paul
Mike Yard
Jason Voorhies
Run Code Online (Sandbox Code Playgroud)
在 kibana 4.4 中我查询为
author:/Jason.*/
Run Code Online (Sandbox Code Playgroud)
所以我得到了所有记录
Jason Pete
Jason Paul
Jason Voorhies
Run Code Online (Sandbox Code Playgroud)
好吧,现在我想做
author:/Jason P.*/
Run Code Online (Sandbox Code Playgroud)
我预计
Jason Pete
Jason Paul
Run Code Online (Sandbox Code Playgroud)
但我明白了
No Records found :(
Run Code Online (Sandbox Code Playgroud)
我的正则表达式出了什么问题?Jason 之后还有其他方法指定空格字符吗?我什至尝试过
author:/Jason\sP.*/
Run Code Online (Sandbox Code Playgroud)
但仍然没有结果
我有一个代表最大堆的数组。例如
84 81 41 79 17 38 33 15 61 6
Run Code Online (Sandbox Code Playgroud)
所以根是最大的。索引 i 处的每个中间层节点最多可以有两个子节点。它们将在 2*i+1 和 2*i+2 处。
如何以逐级方式打印此堆?喜欢
84(0)
81(1) 41(2)
79(3) 17(4) 38(5) 33(6)
15(7) 61(8) 6(9)
Run Code Online (Sandbox Code Playgroud)
数组中每个元素的索引显示在括号中以供澄清。我不必打印索引。我认为这类似于按级别顺序打印 BST,但在这里,堆存储在数组中而不是列表中,这使得它有点棘手!
在NLP中,有一个概念Gazetteer
对于创建注释非常有用.据我所理解,
A gazetteer consists of a set of lists containing names of entities such as cities, organisations, days of the week, etc. These lists are used to ?nd occurrences of these names in text, e.g. for the task of named entity recognition.
所以它本质上是一个查找.这不是骗子吗?如果我们使用a Gazetteer
来检测命名实体,那么就没有太多Natural Language Processing
了.理想情况下,我想要使用NLP
技术检测命名实体.否则它如何比正则表达式模式匹配器更好.
那有意义吗?
我正在尝试解析 PDF 并创建某种层次结构。考虑输入
Title 1
some text some text some text some text some text some text some text
some text some text some text some text some text some text some text
Title 1.1
some more text some more text some more text some more text
some more text some more text some more text some more text
some more text some more text
Title 2
some final text some final text
some final text some final text some …
Run Code Online (Sandbox Code Playgroud) python ×6
nlp ×3
java ×2
nltk ×2
scikit-learn ×2
cassandra ×1
heap ×1
kibana-4 ×1
maven ×1
parsing ×1
pdf ×1
pdfminer ×1
python-2.7 ×1
python-3.x ×1
regex ×1
stanford-nlp ×1
tf-idf ×1
windows ×1
xml ×1