ClickHouse: RabbitMQ engine
Today we will explore the new RabbitMQ
engine in ClickHouse
, introduced in version 20.8
.
Those of you familiar with Kafka
engine might already know what to expect from this engine.
In any case, if you want to see it in action, keep reading.
IMPORTANT: Although RabbitMQ engine is available since version 20.8
, I only recommend using it in
versions 20.11
and higher since the CPU consumption was crazy until then, although there isn’t an
official Altinity release for it yet, which is always recommended. (Check this issue for more information.)
TL;DR
Clone the demo in https://github.com/pachico/demo-clickhouse-rabbitmq-engine and follow instruction or keep reading for some notes.
Features we will use
- ClickHouse RabbitMQ Engine
- ClickHouse JsonEachRow format
- Materialized views
- RabbitMQ
fanout
exchanges
Use case
Our use case is simple. Our application publishes events to RabbitMQ that then ClickHouse needs to fetch from, whatever the purpose for it that is.
How we do it
We need to set 3 elements or this:
- the final table where the messages are finally going to be stored, named
event
- the table with
RabbitMQ engine
namedevent
that will connect to RabbitMQ to fetch messages from, namedrabbitmq_entry
- the materialized view that will send from the first to the second, named
event_view
Let’s get to it
CREATE TABLE IF NOT EXISTS event (
`timestamp` DateTime,
`id` UInt32,
`body` String
) Engine = Memory;
CREATE TABLE IF NOT EXISTS rabbitmq_entry
(
`timestamp` UInt64,
`id` UInt32,
`body` String
) ENGINE = RabbitMQ SETTINGS
rabbitmq_host_port = 'rabbitmq:5672',
rabbitmq_exchange_name = 'clickhouse-exchange',
rabbitmq_routing_key_list = 'myqueue',
rabbitmq_format = 'JSONEachRow',
rabbitmq_exchange_type = 'fanout',
rabbitmq_num_consumers = 1,
rabbitmq_routing_key_list = 'myqueue'
;
CREATE MATERIALIZED VIEW IF NOT EXISTS event_view
TO event AS
SELECT
toDateTime(toUInt64(divide(timestamp, 1000000000))) AS timestamp,
id AS id,
body AS body
FROM rabbitmq_entry;
Some notes:
- in the
event
table, ee’re usingMemory
engine which you wouldn’t use in a real production environment - in the
rabbitmq_entry
we’re usingfanout
exchange type which might be the ideal setting for pub/sub scenarios. If you don’t know the different kinds of exchanges, please check RabbitMQ documentation - the
rabbitmq_entry
table defines that the messages coming from RabbitMQ are in JSON format (check therabbitmq_format = 'JSONEachRow'
part) so I can import rows directly to a (semi) normalized table
We’re publishing events with the following script:
for i in {1..100}
do
TIMESTAMP=$(($(date +%s%N)))
echo "{\"timestamp\":\"$TIMESTAMP\",\"id\":$i,\"body\":\"my body is $i\"}" \
| rabbitmqadmin --username=admin --password=admin publish exchange=clickhouse-exchange routing_key=myqueue &
done
If you pay attention, the timestamp
field is in nanoseconds but in ClickHouse I want to store it
as regular DateTime
field.
For this reason, the materialized view definition formats the nanosecond precision timestamp with the following toDateTime(toUInt64(divide(timestamp, 1000000000)))
.
Wrapping up
I created a scripted demo that will spawn this scenario using docker-compose so you can play with it.
Find it at https://github.com/pachico/demo-clickhouse-rabbitmq-engine and follow README and make help
for help.
I hope you find it useful.