Functions to count uniqs

Functions to count uniqs.

Functions to count uniqs

FunctionFunction(State)StateSizeResultQPS
uniqExactuniqExactState160000310000059.23
uniquniqState20080410031585.55
uniqCombineduniqCombinedState98505100314108.09
uniqCombined(12)uniqCombinedState(12)329198160151.64
uniqCombined(15)uniqCombinedState(15)24783100768110.18
uniqCombined(18)uniqCombinedState(18)196805100332101.56
uniqCombined(20)uniqCombinedState(20)78662110008865.05
uniqCombined64(12)uniqCombined64State(12)329198160164.96
uniqCombined64(15)uniqCombined64State(15)24783100768133.96
uniqCombined64(18)uniqCombined64State(18)196805100332110.85
uniqCombined64(20)uniqCombined64State(20)78662110008866.48
uniqHLL12uniqHLL12State2651101344177.91
uniqThetauniqThetaState3279598045144.05
uniqUpTo(100)uniqUpToState(100)1101222.93

Stats collected via script below on 22.2

funcname=( "uniqExact" "uniq" "uniqCombined" "uniqCombined(12)" "uniqCombined(15)" "uniqCombined(18)" "uniqCombined(20)" "uniqCombined64(12)" "uniqCombined64(15)" "uniqCombined64(18)" "uniqCombined64(20)" "uniqHLL12" "uniqTheta" "uniqUpTo(100)")
funcname2=( "uniqExactState" "uniqState" "uniqCombinedState" "uniqCombinedState(12)" "uniqCombinedState(15)" "uniqCombinedState(18)" "uniqCombinedState(20)" "uniqCombined64State(12)" "uniqCombined64State(15)" "uniqCombined64State(18)" "uniqCombined64State(20)" "uniqHLL12State" "uniqThetaState" "uniqUpToState(100)")

length=${#funcname[@]}
 

for (( j=0; j<length; j++ ));
do
  f1="${funcname[$j]}"
  f2="${funcname2[$j]}"
  size=$( clickhouse-client -q "select ${f2}(toString(number)) from numbers_mt(100000) FORMAT RowBinary" | wc -c )
  result="$( clickhouse-client -q "select ${f1}(toString(number)) from numbers_mt(100000)" )"
  time=$( rm /tmp/clickhouse-benchmark.json; echo "select ${f1}(toString(number)) from numbers_mt(100000)" | clickhouse-benchmark -i200 --json=/tmp/clickhouse-benchmark.json &>/dev/null; cat /tmp/clickhouse-benchmark.json | grep QPS  )

  printf "|%s|%s,%s,%s,%s\n" "$f1" "$f2" "$size" "$result" "$time"
done

groupBitmap

Use Roaring Bitmaps underneath. Return amount of uniq values.

Can be used with Int* types Works really great when your values quite similar. (Low memory usage / state size)

Example with blockchain data, block_number is monotonically increasing number.

SELECT groupBitmap(block_number) FROM blockchain;

┌─groupBitmap(block_number)─┐
                  48478157 
└───────────────────────────┘

MemoryTracker: Peak memory usage (for query): 64.44 MiB.
1 row in set. Elapsed: 32.044 sec. Processed 4.77 billion rows, 38.14 GB (148.77 million rows/s., 1.19 GB/s.)

SELECT uniqExact(block_number) FROM blockchain;

┌─uniqExact(block_number)─┐
                48478157 
└─────────────────────────┘

MemoryTracker: Peak memory usage (for query): 4.27 GiB.
1 row in set. Elapsed: 70.058 sec. Processed 4.77 billion rows, 38.14 GB (68.05 million rows/s., 544.38 MB/s.)