小编Luc*_*ano的帖子

Elasticsearch 5从磁盘读取卡住

我有一个包含ES 5.4的6个节点的集群,其中有4B个小文档但已编入索引.
文档按〜9K索引组织,总计2TB.索引的占用率从几KB到几百GB不等,并且它们是分片的,以便将每个碎片保持在20GB以下.

群集运行状况查询响应:

{
    cluster_name: "##########",
    status: "green",
    timed_out: false,
    number_of_nodes: 6,
    number_of_data_nodes: 6,
    active_primary_shards: 9014,
    active_shards: 9034,
    relocating_shards: 0,
    initializing_shards: 0,
    unassigned_shards: 0,
    delayed_unassigned_shards: 0,
    number_of_pending_tasks: 0,
    number_of_in_flight_fetch: 0,
    task_max_waiting_in_queue_millis: 0,
    active_shards_percent_as_number: 100
}

Run Code Online (Sandbox Code Playgroud)

在向群集发送任何查询之前,它是稳定的,并且每秒都会获得一个批量索引查询,其中包含10个或几千个没有问题的文档.

一切都很好,直到我将一些流量重定向到此群集.一旦它开始响应,大多数服务器开始以250 MB/s的速度从磁盘读取,使群集无响应:

奇怪的是,我在AWS上克隆了这个ES配置(相同的硬件,相同的Linux内核,但不同的Linux版本),我没有问题: 注意:请注意,40MB/s的磁盘读取是我在服务流量的服务器上一直使用的.

如何加快Elasticsearch恢复？

我正在研究6B小型文档的ES集群,以6.5K索引组织,总计6TB.索引在7台服务器之间进行复制和分片.索引占用率从几KB到几百GB不等.

在使用ES之前,我使用了Lucene和相同的文档组织.

基于Lucene的应用程序的恢复非常迅速.实际上,当查询到达时,索引是延迟加载的,然后缓存了IndexReader,以加快将来的回复.

现在,使用Elasticsearch,恢复非常缓慢(几十分钟).请注意,通常在崩溃之前,所有索引都会打开,并且大多数索引都会经常收到索引文档.

是否有任何良好的模式可以缩短ES恢复时间？我也对与索引管理相关的任何事情感兴趣,而不仅仅是关于配置.例如,我想更快地恢复最重要的索引,然后加载所有其他索引; 通过这样做,我可以减少大多数用户的感知停机时间.

我正在使用以下配置:

#Max number of indices cuncurrently loaded at startup
indices.recovery.concurrent_streams: 80

#Max number of bytes cuncurrently readed at startup for loading the indices
indices.recovery.max_bytes_per_sec: 250mb

#Allow to control specifically the number of initial recoveries of primaries that are allowed per node
cluster.routing.allocation.node_initial_primaries_recoveries: 20

#Max number of indices cuncurrently loaded at startup
cluster.routing.allocation.node_concurrent_recoveries: 80

#the number of streams to open (on a node level) for small files (under 5mb) to …

Run Code Online (Sandbox Code Playgroud)

lucene performance elasticsearch

Luc*_*ano

2017 03-21

7
推荐指数

1
解决办法

6074
查看次数

使用一批训练数据训练多输入Keras NN

我想使用Keras训练多输入NN和一批训练数据,但是我无法传递一组输入和输出样本来执行模型上的拟合或train_on_batch.

我的NN定义如下:

    i1 = keras.layers.Input(shape=(2,))
    i2 = keras.layers.Input(shape=(2,))
    i3 = keras.layers.Input(shape=(2,))
    i_layer = keras.layers.Dense(2, activation='sigmoid')
    embedded_i1 = i_layer(i1)
    embedded_i2 = i_layer(i2)
    embedded_i3 = i_layer(i3)

    middle_concatenation = keras.layers.concatenate([embedded_i1, embedded_i2, embedded_i3], axis=1)

    out = keras.layers.Dense(1, activation='sigmoid')(middle_concatenation)

    model = keras.models.Model(inputs=[i1, i2, i3], outputs=out)
    model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

Run Code Online (Sandbox Code Playgroud)

例如,输入的实例(成功用于预测输出)如下:

[array([[0.1, 0.2]]), array([[0.3, 0.5]]), array([[0.1, 0.3]])]

但是当我尝试训练我的模型时:

    inputs = [[np.array([[0.1, 0.2]]), np.array([[0.3, 0.5]]), np.array([[0.1, 0.3]])],
                     [np.array([[0.2, 0.1]]), np.array([[0.5, 0.3]]), np.array([[0.3, 0.1]])]
                         ]
    outputs = np.ones(len(inputs))
    model.fit(inputs, outputs)

Run Code Online (Sandbox Code Playgroud)

我收到此错误:

ValueError: Error when checking …

Run Code Online (Sandbox Code Playgroud)

keras

Luc*_*ano

2017 08-01

6
推荐指数

1
解决办法

2578
查看次数

如何使用Elasticsearch在查询时指定不同的分析器？

我想在查询时使用不同的分析器来编写我的查询.

我从" 控制分析 " 文档中读到了这一点:

[...]搜索时的完整序列:

分析器在查询本身中定义,否则

search_analyzer在字段映射中定义,否则

分析器在字段映射中定义,否则

分析器在索引设置中命名为default_search,默认为

分析器在索引设置中命名为default,默认为

标准分析仪

但我不知道如何编写查询以便为不同的子句指定不同的分析器:

"query"  => [
    "bool" => [
        "must"   => [
            {
                "match": ["my_field": "My query"]
                "<ANALYZER>": <ANALYZER_1>
            }
        ],
        "should" => [
            {
                "match": ["my_field": "My query"]
                "<ANALYZER>": <ANALYZER_2>    
            }
        ]
    ]
]

Run Code Online (Sandbox Code Playgroud)

我知道我可以索引两个或更多不同的字段,但我有强大的辅助内存约束,我不能索引相同的信息N次.

谢谢

elasticsearch

Luc*_*ano

lucky-day

5
推荐指数

1
解决办法

3285
查看次数