常见Cassandra数据类型的字节大小是多少 - 在计算分区磁盘使用时要使用？

Question

常见Cassandra数据类型的字节大小是多少 - 在计算分区磁盘使用时要使用？

nic*_*gul 8 cql cassandra datastax

我正在尝试使用Datastax Academy数据建模课程中的公式计算具有任意数量的列和类型的表中每行的分区大小.

为了做到这一点,我需要知道一些常见的Cassandra数据类型的"字节大小".我试图谷歌这个,但我得到了很多建议,所以我很困惑.

我想知道的字节大小的数据类型是:

单个Cassandra TEXT字符(我用2到4个字节搜索答案)
Cassandra DECIMAL
一个Cassandra INT(我猜它是4个字节)
一个Cassandra BIGINT(我猜它是8个字节)
一个Cassandra BOOELAN(我想它是1个字节,或者它是一个单位)

当然,对于Cassandra中的数据类型大小,也应该理解任何其他考虑因素.

添加更多信息,因为它似乎令人困惑,因为我只想估计"最糟糕的磁盘使用情况",数据会占用任何压缩和Cassandra在幕后完成的其他优化.

我正在关注Datastax学院课程DS220(参见最后的链接)并实施公式,并将使用此答案中的信息作为该公式中的变量.

https://academy.datastax.com/courses/ds220-data-modeling/physical-partition-size

Answer 1

Jam*_*men 11

我认为,从实用的角度来看,在设计时使用ds220课程中的公式获得最坏情况的背面估计是明智的.压缩的影响通常取决于数据中的算法和模式.来自ds220和http://cassandra.apache.org/doc/latest/cql/types.html:

uuid: 16 bytes
timeuuid: 16 bytes
timestamp: 8 bytes
bigint: 8 bytes
counter: 8 bytes
double: 8 bytes
time: 8 bytes
inet: 4 bytes (IPv4) or 16 bytes (IPV6)
date: 4 bytes
float: 4 bytes
int 4 bytes
smallint: 2 bytes
tinyint: 1 byte
boolean: 1 byte (hopefully.. no source for this)
ascii: equires an estimate of average # chars * 1 byte/char
text/varchar: requires an estimate of average # chars * (avg. # bytes/char for language)
map/list/set/blob: an estimate

Run Code Online (Sandbox Code Playgroud)

希望能帮助到你

归档时间：	9 年，4 月前
查看次数：	6372 次
最近记录：	9 年前