小编Abt*_*Pst的帖子

Elasticsearch:如何使用python删除索引

请原谅我,如果这是非常基本的,但我有Python 2.7和Elasticsearch 2.1.1,我只是试图删除索引使用

es.delete(index='researchtest', doc_type='test')
Run Code Online (Sandbox Code Playgroud)

但这给了我

return func(*args, params=params, **kwargs)
TypeError: delete() takes at least 4 arguments (4 given)
Run Code Online (Sandbox Code Playgroud)

我也试过了

es.delete_by_query(index='researchtest', doc_type='test',body='{"query":{"match_all":{}}}')
Run Code Online (Sandbox Code Playgroud)

但我明白了

AttributeError: 'Elasticsearch' object has no attribute 'delete_by_query'
Run Code Online (Sandbox Code Playgroud)

知道为什么吗?对于python,api是否改为2.1.1?

https://elasticsearch-py.readthedocs.org/en/master/api.html#elasticsearch.client.IndicesClient.delete

python python-2.7 elasticsearch

53
推荐指数
2
解决办法
3万
查看次数

Scikit Learn TfidfVectorizer:如何获得具有最高tf-idf分数的前n个术语

我正在研究关键字提取问题.考虑一般情况

tfidf = TfidfVectorizer(tokenizer=tokenize, stop_words='english')

t = """Two Travellers, walking in the noonday sun, sought the shade of a widespreading tree to rest. As they lay looking up among the pleasant leaves, they saw that it was a Plane Tree.

"How useless is the Plane!" said one of them. "It bears no fruit whatever, and only serves to litter the ground with leaves."

"Ungrateful creatures!" said a voice from the Plane Tree. "You lie here in my cooling shade, and …
Run Code Online (Sandbox Code Playgroud)

python nlp nltk tf-idf scikit-learn

28
推荐指数
2
解决办法
2万
查看次数

Scikit Learn Multilabel分类:ValueError:您似乎正在使用传统的多标签数据表示

我正在尝试使用scikit学习0.17与anaconda 2.7的多标签分类问题.这是我的代码

import pandas as pd
import pickle
import re
from sklearn.cross_validation import train_test_split
from sklearn.metrics.metrics import classification_report, accuracy_score, confusion_matrix
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB as MNB
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV

traindf = pickle.load(open("train.pkl","rb"))

X, y = traindf['colC'], traindf['colB'].as_matrix()

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.7)

pip = Pipeline([
('vect', TfidfVectorizer(
                        analyzer='word',
                        binary=False,
                        decode_error='ignore',
                        dtype=<type 'numpy.int64'>,
                        encoding=u'utf-8',
                        input=u'content',
                        lowercase=True,
                        max_df=0.25,
                        max_features=None,
                        min_df=1,
                        ngram_range=(1, 1),
                        norm=u'l2',
                        preprocessor=None,
                        smooth_idf=True,
                        stop_words='english', …
Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn multilabel-classification

13
推荐指数
1
解决办法
8676
查看次数

nltk StanfordNERTagger:NoClassDefFoundError:org/slf4j/LoggerFactory(在Windows中)

注意:我使用Python 2.7作为Anaconda发行版的一部分.我希望这不是nltk 3.1的问题.

我正在尝试使用nltk作为NER

import nltk
from nltk.tag.stanford import StanfordNERTagger 
#st = StanfordNERTagger('stanford-ner/all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar')
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') 
print st.tag(str)
Run Code Online (Sandbox Code Playgroud)

但我明白了

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
    at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1117)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1076)
    at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1057)
    at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3088)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 5 more

Traceback (most recent call last):
  File "X:\jnk.py", line 47, in <module>
    print st.tag(str)
  File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 66, in tag
    return sum(self.tag_sents([tokens]), []) 
  File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 89, in tag_sents
    stdout=PIPE, …
Run Code Online (Sandbox Code Playgroud)

python windows nlp nltk stanford-nlp

11
推荐指数
1
解决办法
5390
查看次数

EmbeddedCassandra:无法运行单元测试

我正在使用EmbeddedCassandraServerHelper单元测试.这是我的pom

<dependencies>
   <dependency>
      <groupId>org.apache.nifi</groupId>
      <artifactId>nifi-api</artifactId>
      <version>1.4.0</version>
   </dependency>
   <dependency>
      <groupId>org.apache.nifi</groupId>
      <artifactId>nifi-utils</artifactId>
      <version>1.4.0</version>
   </dependency>
   <dependency>
      <groupId>org.apache.nifi</groupId>
      <artifactId>nifi-mock</artifactId>
      <version>1.4.0</version>
      <scope>test</scope>
   </dependency>
   <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-simple</artifactId>
      <version>1.7.25</version>
      <scope>test</scope>
   </dependency>
   <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.8.2</version>
      <scope>test</scope>
   </dependency>
   <dependency>
      <groupId>com.jcraft</groupId>
      <artifactId>jsch</artifactId>
      <version>0.1.54</version>
   </dependency>
   <dependency>
      <groupId>com.google.code.gson</groupId>
      <artifactId>gson</artifactId>
      <version>2.8.2</version>
   </dependency>
   <dependency>
      <groupId>org.mockito</groupId>
      <artifactId>mockito-core</artifactId>
      <version>1.10.8</version>
   </dependency>
   <dependency>
      <groupId>com.datastax.cassandra</groupId>
      <artifactId>cassandra-driver-core</artifactId>
      <version>3.4.0</version>
   </dependency>
   <dependency>
      <groupId>com.datastax.cassandra</groupId>
      <artifactId>cassandra-driver-mapping</artifactId>
      <version>3.4.0</version>
   </dependency>
   <!-- https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-extras -->
   <dependency>
      <groupId>com.datastax.cassandra</groupId>
      <artifactId>cassandra-driver-extras</artifactId>
      <version>3.4.0</version>
   </dependency>
   <dependency>
      <groupId>org.cassandraunit</groupId>
      <artifactId>cassandra-unit</artifactId>
      <version>3.3.0.2</version>
      <scope>test</scope>
   </dependency>
</dependencies>
Run Code Online (Sandbox Code Playgroud)

我正在引用文档

https://github.com/jsevellec/cassandra-unit/wiki/How-to-use-it-in-your-code

首先,我试过了

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

private …
Run Code Online (Sandbox Code Playgroud)

java cassandra maven datastax-java-driver cassandraunit

10
推荐指数
1
解决办法
1384
查看次数

Python错误:ImportError:没有名为'xml.etree'的模块

我只是想解析一个XML文件:

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Run Code Online (Sandbox Code Playgroud)

但这给了我:

import xml.etree.ElementTree as ET
ImportError: No module named 'xml.etree'
Run Code Online (Sandbox Code Playgroud)

我使用的是Python 3.5.我试过用Python 2.7和3.4编写相同的代码,但我总是得到这个错误.我认为XML库是标准的.另外,我可以在我的Lib文件夹中看到:

在此输入图像描述

那为什么不能拿起模块呢?我真的很困惑.我是否必须在某个环境变量中进行一些更改?

请帮忙.

python xml python-3.x

9
推荐指数
3
解决办法
2万
查看次数

Kibana:无法使用有空间的正则表达式进行查询

我有一个字段作为

author
Jason Pete
Jason Paul
Mike Yard
Jason Voorhies
Run Code Online (Sandbox Code Playgroud)

在 kibana 4.4 中我查询为

author:/Jason.*/
Run Code Online (Sandbox Code Playgroud)

所以我得到了所有记录

Jason Pete
Jason Paul
Jason Voorhies
Run Code Online (Sandbox Code Playgroud)

好吧,现在我想做

author:/Jason P.*/
Run Code Online (Sandbox Code Playgroud)

我预计

Jason Pete
Jason Paul
Run Code Online (Sandbox Code Playgroud)

但我明白了

No Records found :(
Run Code Online (Sandbox Code Playgroud)

我的正则表达式出了什么问题?Jason 之后还有其他方法指定空格字符吗?我什至尝试过

author:/Jason\sP.*/
Run Code Online (Sandbox Code Playgroud)

但仍然没有结果

regex elasticsearch kibana-4 elasticsearch-2.0

9
推荐指数
1
解决办法
1万
查看次数

Java:如何逐级打印存储为数组的堆

我有一个代表最大堆的数组。例如

84 81 41 79 17 38 33 15 61 6
Run Code Online (Sandbox Code Playgroud)

所以根是最大的。索引 i 处的每个中间层节点最多可以有两个子节点。它们将在 2*i+1 和 2*i+2 处。

如何以逐级方式打印此堆?喜欢

                             84(0)

                 81(1)                  41(2)

            79(3)        17(4)     38(5)       33(6) 

       15(7)    61(8)         6(9)
Run Code Online (Sandbox Code Playgroud)

数组中每个元素的索引显示在括号中以供澄清。我不必打印索引。我认为这类似于按级别顺序打印 BST,但在这里,堆存储在数组中而不是列表中,这使得它有点棘手!

java heap

9
推荐指数
2
解决办法
1万
查看次数

NLP:Gazetteer是骗子

在NLP中,有一个概念Gazetteer对于创建注释非常有用.据我所理解,

A gazetteer consists of a set of lists containing names of entities such as cities, organisations, days of the week, etc. These lists are used to ?nd occurrences of these names in text, e.g. for the task of named entity recognition.

所以它本质上是一个查找.这不是骗子吗?如果我们使用a Gazetteer来检测命名实体,那么就没有太多Natural Language Processing了.理想情况下,我想要使用NLP技术检测命名实体.否则它如何比正则表达式模式匹配器更好.

那有意义吗?

nlp named-entity-recognition

8
推荐指数
1
解决办法
1737
查看次数

Python PDFMiner:如何将大纲链接到基础文本

我正在尝试解析 PDF 并创建某种层次结构。考虑输入

Title 1
some text some text some text some text some text some text some text 
some text some text some text some text some text some text some text 

Title 1.1
some more text some more text some more text some more text 
some more text some more text some more text some more text 
some more text some more text 

Title 2
some final text some final text 
some final text some final text some …
Run Code Online (Sandbox Code Playgroud)

python pdf parsing pdfminer

8
推荐指数
1
解决办法
1418
查看次数