我开始使用python库elasticsearch-dsl.
我正在尝试实现父子关系,但它无法正常工作:
class Location(DocType):
name = String(analyzer='snowball', fields={'raw': String(index='not_analyzed')})
latitude = String(analyzer='snowball')
longitude = String(analyzer='snowball')
created_at = Date()
class Building(DocType):
parent = Location()
Run Code Online (Sandbox Code Playgroud) 我们使用AWS管理的Elasticsearch服务,最近从1.5升级到2.3.我们在python中使用elasticsearch-dsl包来构建我们的查询并设法迁移我们的大多数查询,但无论我尝试什么,geo_distance都会被破坏.
制图:
{
'company': {
'properties': {
'id': {'type': 'integer'},
'company_number': {'type': 'string'},
'addresses': {
'type': 'nested',
'properties': {
'postcode': {'type': 'string', 'index': 'not_analyzed'},
'location': {'type': 'geo_point'}
}
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
Python代码使用elasticsearch-dsl == 0.0.11
test_location = '53.5411062377, -2.11485504709'
test_distance = "3miles"
location_filter = F("geo_distance",
location=test_location,
distance=test_distance)
query = query.filter("nested",
path="addresses",
filter=location_filter)
Run Code Online (Sandbox Code Playgroud)
库生成的查询:
{'query': {'filtered': {'filter': {'nested': {'filter': {'geo_distance': {'distance': u'3miles', 'location': '53.5411062377, -2.11485504709'}}, 'path': 'addresses'}}, 'query': {'match_all': {}}}}}
Run Code Online (Sandbox Code Playgroud)
我们使用相同的映射在新的2.3上创建了一个全新的索引.
更新到elasticsearch-dsl == 2.1.0并尝试将过滤器转换为查询后:
geo_query = Q({"bool": {
"must": [
{ …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用elasticsearch-dsl实现多重索引方法.基本上有两个步骤:
1.创建别名:
PUT /tweets_1/_alias/tweets_search
PUT /tweets_1/_alias/tweets_index
Run Code Online (Sandbox Code Playgroud)
2.必要时更改别名:
POST /_aliases
{
"actions": [
{ "add": { "index": "tweets_2", "alias": "tweets_search" }},
{ "remove": { "index": "tweets_1", "alias": "tweets_index" }},
{ "add": { "index": "tweets_2", "alias": "tweets_index" }}
]
}
Run Code Online (Sandbox Code Playgroud)
我只能使用elasticsearch-py(而不是dsl)实现第1步:
from elasticsearch.client import IndicesClient
IndicesClient(client).("tweets_1", "tweets_search")
IndicesClient(client).("tweets_1", "tweets_index")
Run Code Online (Sandbox Code Playgroud)
我不知道如何为第2步做到这一点.那么,elasticsearch-dsl(或者至少在elasticsearch-py中)的等价物是什么?
查询对整数数组数据类型有多复杂?这是我在python中的类,用于将数据注入elasticsearch:
class Paragraph(DocType):
body = Text(analyzer="standard")
published_from = Date()
lines = Integer()
n_paragraph = Integer()
capture = Integer()
class Meta:
index = "my_index"
def save(self, **kwargs):
self.lines = len(self.body.split())
return super(Paragraph, self).save(**kwargs)
Run Code Online (Sandbox Code Playgroud)
我在捕获中注入一个整数数组.这是有趣的路线:
paragraph.capture = [1, 0, 5, 7]
Run Code Online (Sandbox Code Playgroud)
我设法查询列表中是否有数字::
cnx = Search().using(client)
s = cnx.query("match", capture=5)
正如@Val所说,我们可以添加另一个包含sum的字段来查询总和
如何查询特定索引,例如paragraph.capture[1] >= 1?
我们看到Elasticsearch对数组索引的查询是相关的,但我无法建立链接.
我django-elasticsearch-dsl在我们的一个项目中使用,在 AWS Elasticsearch 中创建集群后,我开始看到以下错误:Root certificates are missing for certificate validation. Either pass them in using the ca_certs parameter or install certifi to use it automatically.. 提出了一些解决方案来解决https://elasticsearch-py.readthedocs.io此链接 [link][1] 中的问题,但这不是django-elasticsearch-dsl使用elasticsearch-py. 我只能通过settings.py这样设置端点:
ELASTICSEARCH_DSL = {
'default': {
'hosts': 'https://my-aws-elasticsearch-endpoint.eu-central-1.es.amazonaws.com'
}
}
Run Code Online (Sandbox Code Playgroud)
如何添加/启用此证书 django-elasticsearch-dsl
django amazon-web-services elasticsearch elasticsearch-dsl elasticsearch-dsl-py
我正在尝试使用设置timeout特定请求的elasticsearch_dsl。我尝试了以下方法:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, F
...
def do_stuff(self, ids):
client = Elasticsearch(['localhost'], timeout=30)
s = Search(using=client,
index= 'my_index',
doc_type=['my_type'])
s = s[0:100]
f = F('terms', my_field=list(ids))
s.filter(f)
response = s.execute()
return response.hits.hits
Run Code Online (Sandbox Code Playgroud)
笔记:
doc_type改为包含一百万个实体的类型时,查询运行良好。doc_type数十亿个实体时,出现超时错误,显示默认超时为10秒。从elasticsearch_dsl 文档中,我什至尝试设置默认的连接超时:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, F
from elasticsearch_dsl import connections
connections.connections.create_connection(hosts=['localhost'], timeout=30)
Run Code Online (Sandbox Code Playgroud)
我仍然收到10秒超时错误。
我有课,我尝试在其中设置student_id为_id字段
弹性搜索. 我指的是来自 elasticsearch-dsl 文档的持久示例。
from elasticsearch_dsl import DocType, String
ELASTICSEARCH_INDEX = 'student_index'
class StudentDoc(DocType):
'''
Define mapping for Student type
'''
student_id = String(required=True)
name = String(null_value='')
class Meta:
# id = student_id
index = ELASTICSEARCH_INDEX
Run Code Online (Sandbox Code Playgroud)
我通过设置绑定id,Meta但它不起作用。
我得到解决方案作为覆盖save方法,我实现了这个
def save(self, **kwargs):
'''
Override to set metadata id
'''
self.meta.id = self.student_id
return super(StudentDoc, self).save(**kwargs)
Run Code Online (Sandbox Code Playgroud)
我正在创建这个对象
>>> a = StudentDoc(student_id=1, tags=['test'])
>>> a.save()
Run Code Online (Sandbox Code Playgroud)
有没有直接的方法可以从Meta没有覆盖 …
我的 Elasticsearch 数据库中有几个索引,如下所示
Index_2019_01
Index_2019_02
Index_2019_03
Index_2019_04
.
.
Index_2019_12
Run Code Online (Sandbox Code Playgroud)
假设我只想搜索前 3 个索引。我的意思是像这样的正则表达式:
select count(*) from Index_2019_0[1-3] where LanguageId="English"
Run Code Online (Sandbox Code Playgroud)
在 Elasticsearch 中执行此操作的正确方法是什么?
因此,我们在 Django 项目中使用elasticsearch,并且使用elasticsearch-dsl python 库。
我们在生产中遇到以下错误:
ConflictError(409, '{"took":7,"timed_out":false,"total":1,"deleted":0,"batches":1,"version_conflicts":1,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"events","type":"_doc","id":"KJ7SpWsBZnen1jNBRWWM","cause":{"type":"version_conflict_engine_exception","reason":"[KJ7SpWsBZnen1jNBRWWM]: version conflict, required seqNo [1418], primary term [1]. current document has seqNo [1419] and primary term [1]","index_uuid":"2-fSZILVQzuJE8KVmpLFXQ","shard":"0","index":"events"},"status":409}]}')
Run Code Online (Sandbox Code Playgroud)
并具有更好的格式:
{
"took": 7,
"timed_out": false,
"total": 1,
"deleted": 0,
"batches": 1,
"version_conflicts": 1,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0,
"failures": [
{
"index": "events",
"type": "_doc",
"id": "KJ7SpWsBZnen1jNBRWWM",
"cause": {
"type": "version_conflict_engine_exception",
"reason": "[KJ7SpWsBZnen1jNBRWWM]: version conflict, required seqNo [1418], primary term [1]. current document has …Run Code Online (Sandbox Code Playgroud) python django elasticsearch elasticsearch-dsl elasticsearch-py
我已经使用“AND”默认运算符设置了我的查询字符串搜索。我的查询如下:
{
"query": {
"query_string" : {
"query" : "Adam KT2 7AJ",
"default_operator" : "AND"
}
}
}
Run Code Online (Sandbox Code Playgroud)
我希望这能提供与下面相同的结果......但事实并非如此。
{
"query": {
"query_string" : {
"query" : "Adam AND KT2 AND 7AJ",
"default_operator" : "OR"
}
}
}
Run Code Online (Sandbox Code Playgroud)
虽然我看到 default_operator 对我的搜索结果产生了影响,但它并没有像我希望的那样工作。
例如:
查询 1:
Adam AND KT2 AND 7AJ, default_operator: or查询2:
Adam KT2 7AJ, default_operator: andanddefault_operator 在查询中产生与 AND 不同的结果)查询3:
KT2 7AJ, default_operator: and