为upsert和select查询建模cassandra表

kin*_*jou 9 cassandra cassandra-2.0

我设计了以下表来存储服务器警报:

create table IF NOT EXISTS host_alerts(
    unique_key text,
    host_id text,
    occur_time timestamp,
    clear_time timestamp,
    last_occur timestamp,
    alarm_name text,
    primary key (unique_key,host_id,clear_time)
);
Run Code Online (Sandbox Code Playgroud)

我们输入一些数据:

truncate host_alerts;

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:01:00+0530');

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'2015-07-01 00:02:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');
Run Code Online (Sandbox Code Playgroud)

我的应用程序将运行的查询是:

//All alarms which are **not cleared** for host_id
select * from host_alerts where  host_id = 'server-1' and clear_time = '1970-01-01 00:00:00+0530';

//All alarms which are  cleared for host_id
select * from host_alerts where  host_id = 'server-1' and clear_time > '2015-07-01 00:00:00+0530';

//All alarms between first occurrence
select * from host_alerts where  host_id = 'server-1' 
and occur_time > '2015-07-01 00:02:00+0530'and occur_time < '2015-07-01 00:05:00+0530';
Run Code Online (Sandbox Code Playgroud)

我不知道是否应该准备更多的表示例:host_alerts_by_hostname或host_alerts_by_cleartime等,或者只是添加聚类索引.由于唯一ID是唯一的唯一列,但我需要从其他列中检索数据

未清除警报: '1970-01-01 00:00:00 + 0530'清除事件有一些日期值.

host_id是服务器名称

happen_time是事件发生的时间.

last_occur是事件再次重新修复的时间.

alarm_name是系统发生的事情.

我如何建模我的表,以便我可以基于unique_id执行这些查询和更新?使用我所尝试的选择是不可能的,并且在upsert期间为同一unique_key创建新行.

Jim*_*yer 6

我想你可能需要三个表来支持你的三种查询类型.

第一个表将支持有关每个主机发生警报的历史记录的时间范围查询:

CREATE TABLE IF NOT EXISTS host_alerts_history (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, occur_time)
);

SELECT * FROM host_alerts_history WHERE host_id = 'server-1' AND occur_time > '2015-08-16 10:05:37-0400';
Run Code Online (Sandbox Code Playgroud)

第二个表将跟踪每个主机的未清除警报:

CREATE TABLE IF NOT EXISTS host_uncleared_alarms (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, alarm_name)
);

SELECT * FROM host_uncleared_alarms WHERE host_id = 'server-1';
Run Code Online (Sandbox Code Playgroud)

最后一个表将跟踪每个主机何时清除警报:

CREATE TABLE IF NOT EXISTS host_alerts_by_cleartime (
    host_id text,
    clear_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, clear_time)
);

SELECT * FROM host_alerts_by_cleartime WHERE host_id = 'server-1' AND clear_time > '2015-08-16 10:05:37-0400';
Run Code Online (Sandbox Code Playgroud)

当新的警报事件到来时,您将执行此批处理:

BEGIN BATCH
INSERT INTO host_alerts_history (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
INSERT INTO host_uncleared_alarms (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;
Run Code Online (Sandbox Code Playgroud)

请注意,插入到未清除表中是一个upsert,因为时间戳不是键的一部分.因此,对于每个警报名称,该表只有一个条目,其中包含最后一次出现的时间戳.

当警报清除事件到达时,您将执行此批处理:

BEGIN BATCH
DELETE FROM host_uncleared_alarms WHERE host_id = 'server-1' AND alarm_name = 'disk full';
INSERT INTO host_alerts_by_cleartime (host_id, clear_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;
Run Code Online (Sandbox Code Playgroud)

我真的不明白你的"unique_key"是什么或来自哪里.我不确定是否需要它,因为host_id和alarm_name的组合应该是您想要使用的粒度级别.在混合中添加另一个唯一键可以产生许多无与伦比的警报/清除事件.如果unique_key是一个警报ID,那么在我的示例中使用它作为代替alarm_name的密钥,并将alarm_name作为数据列.

为了防止您的表随着旧数据填满,您可以使用TTL功能在几天后自动删除行.