pachicoHub

Thinking out loud

ClickHouse: RabbitMQ engine

Today we will explore the new RabbitMQ engine in ClickHouse, introduced in version 20.8. Those of you familiar with Kafka engine might already know what to expect from this engine. In any case, if you want to see it in action, keep reading.

IMPORTANT: Although RabbitMQ engine is available since version 20.8, I only recommend using it in versions 20.11 and higher since the CPU consumption was crazy until then, although there isn’t an official Altinity release for it yet, which is always recommended. (Check this issue for more information.)

TL;DR

Clone the demo in https://github.com/pachico/demo-clickhouse-rabbitmq-engine and follow instruction or keep reading for some notes.

Features we will use

  • ClickHouse RabbitMQ Engine
  • ClickHouse JsonEachRow format
  • Materialized views
  • RabbitMQ fanout exchanges

Use case

Our use case is simple. Our application publishes events to RabbitMQ that then ClickHouse needs to fetch from, whatever the purpose for it that is.

How we do it

We need to set 3 elements or this:

  • the final table where the messages are finally going to be stored, named event
  • the table with RabbitMQ engine named event that will connect to RabbitMQ to fetch messages from, named rabbitmq_entry
  • the materialized view that will send from the first to the second, named event_view

Let’s get to it

CREATE TABLE IF NOT EXISTS event (
    `timestamp` DateTime,
    `id` UInt32,
    `body` String
) Engine = Memory;

CREATE TABLE IF NOT EXISTS rabbitmq_entry
(
    `timestamp` UInt64,
    `id` UInt32,
    `body` String
) ENGINE = RabbitMQ SETTINGS
    rabbitmq_host_port = 'rabbitmq:5672',
    rabbitmq_exchange_name = 'clickhouse-exchange',
    rabbitmq_routing_key_list = 'myqueue',
    rabbitmq_format = 'JSONEachRow',
    rabbitmq_exchange_type = 'fanout',
    rabbitmq_num_consumers = 1,
    rabbitmq_routing_key_list = 'myqueue'
    ;

CREATE MATERIALIZED VIEW IF NOT EXISTS event_view
TO event AS
SELECT
    toDateTime(toUInt64(divide(timestamp, 1000000000))) AS timestamp,
    id AS id,
    body AS body
FROM rabbitmq_entry;

Some notes:

  • in the event table, ee’re using Memory engine which you wouldn’t use in a real production environment
  • in the rabbitmq_entry we’re using fanoutexchange type which might be the ideal setting for pub/sub scenarios. If you don’t know the different kinds of exchanges, please check RabbitMQ documentation
  • the rabbitmq_entry table defines that the messages coming from RabbitMQ are in JSON format (check the rabbitmq_format = 'JSONEachRow' part) so I can import rows directly to a (semi) normalized table

We’re publishing events with the following script:

for i in {1..100}
do
    TIMESTAMP=$(($(date +%s%N)))
    echo "{\"timestamp\":\"$TIMESTAMP\",\"id\":$i,\"body\":\"my body is $i\"}" \
        | rabbitmqadmin --username=admin --password=admin publish exchange=clickhouse-exchange routing_key=myqueue &
done

If you pay attention, the timestamp field is in nanoseconds but in ClickHouse I want to store it as regular DateTime field. For this reason, the materialized view definition formats the nanosecond precision timestamp with the following toDateTime(toUInt64(divide(timestamp, 1000000000))).

Wrapping up

I created a scripted demo that will spawn this scenario using docker-compose so you can play with it.

Find it at https://github.com/pachico/demo-clickhouse-rabbitmq-engine and follow README and make help for help.

I hope you find it useful.