slm*_*ers 18 timeout cassandra cqlsh
我正在做一个涉及构建和查询Cassandra数据集群的学生项目.
当我的群集负载很轻(大约30GB)时,我的查询运行没有问题,但现在它的数量相当大(1/2TB)我的查询超时.
我认为这个问题可能会出现,所以在我开始生成和加载测试数据之前,我在cassandra.yaml文件中更改了这个值:
request_timeout_in_ms(默认值:10000)其他杂项操作的默认超时.
但是,当我将该值更改为1000000时,cassandra似乎在启动时挂起 - 但这可能只是工作中的大超时.
我的数据生成目标是2TB.如何在不遇到超时的情况下查询大量空间?
查询:
SELECT huntpilotdn
FROM project.t1
WHERE (currentroutingreason, orignodeid, origspan,
origvideocap_bandwidth, datetimeorigination)
> (1,1,1,1,1)
AND (currentroutingreason, orignodeid, origspan,
origvideocap_bandwidth, datetimeorigination)
< (1000,1000,1000,1000,1000)
LIMIT 10000
ALLOW FILTERING;
SELECT destcause_location, destipaddr
FROM project.t2
WHERE datetimeorigination = 110
AND num >= 11612484378506
AND num <= 45880092667983
LIMIT 10000;
SELECT origdevicename, duration
FROM project.t3
WHERE destdevicename IN ('a','f', 'g')
LIMIT 10000
ALLOW FILTERING;
Run Code Online (Sandbox Code Playgroud)
我有一个具有相同模式的演示密钥空间,但数据大小要小得多(~10GB),这些查询在该密钥空间中运行得很好.
查询的所有这些表都有数百万行,每行约30列.
gca*_*lli 50
如果您使用的是Datastax,cqlsh则可以将客户端超时秒指定为命令行参数.默认是10.
$ cqlsh --request-timeout=3600
我猜你也在使用二级索引.您正在发现为什么不建议使用二级索引查询和允许过滤查询...因为这些类型的设计模式不适用于大型数据集.使用支持主键查找的查询表重建模型,因为这就是Cassandra的工作方式.
编辑
"受约束的变量是群集密钥."
对......这意味着它们不是分区键.在不限制分区键的情况下,您基本上扫描整个表,因为群集密钥仅在其分区键中有效(群集数据).
小智 8
要在Apache Cassandra中更改客户端超时限制,有两种方法:
技巧1:这是一个很好的技巧:
1. Navigate to the following hidden directory under the home folder: (Create the hidden directory if not available)
$ pwd
~/.cassandra
2. Modify the file cqlshrc in it to an appropriate time in seconds: (Create the file if not available)
Original Setting:
$ more cqlshrc
[connection]
client_timeout = 10
# Can also be set to None to disable:
# client_timeout = None
$
New Setting:
$ vi cqlshrc
$ more cqlshrc
[connection]
client_timeout = 3600
# Can also be set to None to disable:
# client_timeout = None
$
Note: Here time is in seconds. Since, we wanted to increase the timeout to one hour. Hence, we have set it to 3600 seconds.
Run Code Online (Sandbox Code Playgroud)
技巧2:这不是一个好技术,因为您正在更改客户端程序(cqlsh)本身的设置.注意:如果您已使用技术1进行了更改 - 那么它将覆盖使用技术2指定的时间.因为,配置文件设置具有最高优先级.
1. Navigate to the path where cqlsh program is located. This you can find using the which command:
$ which cqlsh
/opt/apache-cassandra-2.1.9/bin/cqlsh
$ pwd
/opt/apache-cassandra-2.1.9/bin
$ ls -lrt cqlsh
-rwxr-xr-x 1 abc abc 93002 Nov 5 12:54 cqlsh
2. Open the program cqlsh and modify the time specified using the client_timeout variable. Note that time is specified in seconds.
$ vi cqlsh
In __init__ function:
def __init__(self, hostname, port, color=False,
username=None, password=None, encoding=None, stdin=None, tty=True,
completekey=DEFAULT_COMPLETEKEY, use_conn=None,
cqlver=DEFAULT_CQLVER, keyspace=None,
tracing_enabled=False, expand_enabled=False,
display_time_format=DEFAULT_TIME_FORMAT,
display_float_precision=DEFAULT_FLOAT_PRECISION,
max_trace_wait=DEFAULT_MAX_TRACE_WAIT,
ssl=False,
single_statement=None,
client_timeout=10,
connect_timeout=DEFAULT_CONNECT_TIMEOUT_SECONDS):
In options.client_timeout setting:
options.client_timeout = option_with_default(configs.get, 'connection', 'client_timeout', '10')
You can modify at both these places. The second line picks up client_timeout information from the cqlshrc file.
Run Code Online (Sandbox Code Playgroud)