ClickHouse: RabbitMQ engine
Today we will explore the new RabbitMQ engine in ClickHouse, introduced in version 20.8.
Those of you familiar with Kafka engine might already know what to expect from this engine.
In any case, if you want to see it in action, keep reading.
IMPORTANT: Although RabbitMQ engine is available since version 20.8, I only recommend using it in
versions 20.11 and higher since the CPU consumption was crazy until then, although there isn’t an
official Altinity release for it yet, which is always recommended. (Check this issue for more information.)
TL;DR
Clone the demo in https://github.com/pachico/demo-clickhouse-rabbitmq-engine and follow instruction or keep reading for some notes.
Features we will use
- ClickHouse RabbitMQ Engine
- ClickHouse JsonEachRow format
- Materialized views
- RabbitMQ
fanoutexchanges
Use case
Our use case is simple. Our application publishes events to RabbitMQ that then ClickHouse needs to fetch from, whatever the purpose for it that is.
How we do it
We need to set 3 elements or this:
- the final table where the messages are finally going to be stored, named
event - the table with
RabbitMQ enginenamedeventthat will connect to RabbitMQ to fetch messages from, namedrabbitmq_entry - the materialized view that will send from the first to the second, named
event_view
Let’s get to it
CREATE TABLE IF NOT EXISTS event (
`timestamp` DateTime,
`id` UInt32,
`body` String
) Engine = Memory;
CREATE TABLE IF NOT EXISTS rabbitmq_entry
(
`timestamp` UInt64,
`id` UInt32,
`body` String
) ENGINE = RabbitMQ SETTINGS
rabbitmq_host_port = 'rabbitmq:5672',
rabbitmq_exchange_name = 'clickhouse-exchange',
rabbitmq_routing_key_list = 'myqueue',
rabbitmq_format = 'JSONEachRow',
rabbitmq_exchange_type = 'fanout',
rabbitmq_num_consumers = 1,
rabbitmq_routing_key_list = 'myqueue'
;
CREATE MATERIALIZED VIEW IF NOT EXISTS event_view
TO event AS
SELECT
toDateTime(toUInt64(divide(timestamp, 1000000000))) AS timestamp,
id AS id,
body AS body
FROM rabbitmq_entry;Some notes:
- in the
eventtable, ee’re usingMemoryengine which you wouldn’t use in a real production environment - in the
rabbitmq_entrywe’re usingfanoutexchange type which might be the ideal setting for pub/sub scenarios. If you don’t know the different kinds of exchanges, please check RabbitMQ documentation - the
rabbitmq_entrytable defines that the messages coming from RabbitMQ are in JSON format (check therabbitmq_format = 'JSONEachRow'part) so I can import rows directly to a (semi) normalized table
We’re publishing events with the following script:
for i in {1..100}
do
TIMESTAMP=$(($(date +%s%N)))
echo "{\"timestamp\":\"$TIMESTAMP\",\"id\":$i,\"body\":\"my body is $i\"}" \
| rabbitmqadmin --username=admin --password=admin publish exchange=clickhouse-exchange routing_key=myqueue &
doneIf you pay attention, the timestamp field is in nanoseconds but in ClickHouse I want to store it
as regular DateTime field.
For this reason, the materialized view definition formats the nanosecond precision timestamp with the following toDateTime(toUInt64(divide(timestamp, 1000000000))).
Wrapping up
I created a scripted demo that will spawn this scenario using docker-compose so you can play with it.
Find it at https://github.com/pachico/demo-clickhouse-rabbitmq-engine and follow README and make help for help.
I hope you find it useful.