Using Vector to store Kong logs in ClickHouse
Today we will explore how to store Kong logs in ClickHouse using Vector.
TL;DR
Clone the demo in https://github.com/pachico/demo-kong-vector-clickhouse and follow instructions or keep reading for some notes.
Overview
Kong is the de-facto standard for API gateway solutions (sometimes also called API management). It’s fast and has a number of plugins that should cover the most common use cases. In our demo, we will use two of these: key-auth
to only allow authorized consumers to use our API endpoints, and tcp-log
to store details about every request handled by Kong.
To store these request events, we will point the tcp-log
plugin to Vector, which will receive the events, do a number of transformations and, ultimately, send the events to ClickHouse to be stored.
Before you say anything, depending on the use case, you might want to use a metrics
oriented solution rather than an event log
one, but this is a demo about features, not a set of instructions to solve all use cases.
If you are curious about the difference between the two, you can read this short but useful article about the subject.
Let’s see what we can do.
Features we will use
- Kong’s
key-auth
plugin - Kong’s
tcp-log
plugin - Vector’s
socket
source module - Vector’s
remap
transform module - Vector’s
unit testing
feature - Vector’s
ClickHouse
sink module - ClickHouse
Vegeta
HTTP load testing tool
Let’s get things together
This demo assumes you either have cloned or follow the demo project in https://github.com/pachico/demo-kong-vector-clickhouse.
Let’s start from Kong, which is the front service our users will hit. We will configure Kong to resolve a route to serve a dummy NGINX endpoint (vanilla docker instance).
Fortunately, Kong can now work without a database by starting it with a declarative configuration.
We will use the following configuration:
_format_version: "1.1"
services:
- name: my-service
url: http://nginx:80
tags:
- mytags
routes:
- name: my-route
methods:
- GET
paths:
- /mypath
strip_path: true
consumers:
- username: my-user
keyauth_credentials:
- key: my-key
plugins:
- name: tcp-log
enabled: true
config:
host: vector
port: 9000
- name: key-auth
enabled: true
config:
key_names: ["apikey"]
From this, pay attention to the plugins we’re using and how they are set up.
We will send all logs (request events) using tcp-log
plugin to a host named vector
using the port 9000
.
Additionally, we will require consumers (API users) to send an API key
in order to consume our services. This API key will need to be sent in each request in the header’s apikey
key.
We have also created a consumer that has my-user
as name and, more importantly, my-key
as API key value.
Since both plugins are set at a global scope, no request can go through without being authenticated and all requests will be logged.
To make everything work, we have defined a service named my-service
that has a route named my-route
that resolves GET
requests to /my-path
by sending them to http://nginx:80
.
So far, very easy.
At the opposite extreme, we will set up the table in ClickHouse where we want all our logs to be stored. This is a simple demo and our table will only contain a few columns, but enough to show what can be done.
Our table definition is the following:
CREATE TABLE IF NOT EXISTS `kong`.`log` (
`timestamp` DateTime,
`consumer_username` Nullable(String),
`request_method` LowCardinality(String),
`request_size` UInt32,
`request_uri` String,
`request_url` String,
`service_name` String,
`response_status` UInt16,
`latency_kong` UInt32,
`latency_proxy` UInt32,
`latency_request` UInt32
) ENGINE = MergeTree()
ORDER BY
toYYYYMMDD(timestamp);
All Kong request events will be stored here, which means this can quickly be filled with hundreds of millions of rows.
Although ClickHouse can handle billions of rows even in a laptop (yes, I have done it), it wouldn’t be wise not to exploit ClickHouse’s materialized view feature to aggregate data and have blazing fast and real time results.
Once again, this is just a demo.
Now, those familiar with ClickHouse already know that it doesn’t excel at having many small inserts but, instead, it likes infrequent massive ones, quite the opposite to the reality of logs.
To solve this, many solutions have been put in place, like its Kafka Engine
(you publish data to Kafka so that Clickhouse can retrieve the data from here), its RabbitMQ Engine
, its buffer engine
(as the name says, it buffers inserts in each node before flushing them to a table) but also external solutions like clickhouse-bulk or kittenhouse.
Today we won’t use any of these, but Vector’s ClickHouse sink module, which acts as a buffer too.
As we said before, logs are sent to Vector to its socket module. Let’s see that.
[sources.kong_logs]
type = "socket"
address = "0.0.0.0:9000"
max_length = 102400
mode = "tcp"
Again, quite clear, it accepts connections from everywhere using the port 9000 in TCP mode.
Now is when the fun starts. Kong sends each event as a JSON message, which is quite big (you can see a sample here).
Once it’s received, Vector adds a timestamp
field marking when it’s been received.
We don’t want that (neither should you). We want to use the started_at
field to know when Kong has received the request, instead.
Unfortunately, this field is in epoch format and in milliseconds, so we can’t use it out of the box and we’ll have to transform it. Additionally, we’re not interested in much of that JSON and certain things should be redacted/deleted too for security reasons, like the API key, which can be found in .request.header.apikey
.
To do that, we will use the remap
transformation module, which uses Vector Remap Language (VRL) to manipulate the object.
This is the first transformation:
[transforms.kong_logs_shortened]
type = "remap"
inputs = ["kong_logs"]
source = '''
., _ = parse_json(.message)
ts, _ = .started_at / 1000
.timestamp = to_timestamp(floor(ts))
.request.headers = {
"user-agent": .request.headers."user-agent",
"host": .request.headers.host,
"x-forwarded-for": .request.headers."x-forwarded-for",
"x-consumer-id": .request.headers."x-consumer-id",
"x-consumer-username": .request.headers."x-consumer-username"
}
.response.headers = {
"content-length": .response.response."content-length",
"content-type": .response.response."content-type",
"x-kong-proxy-latency": .response.response."x-kong-proxy-latency",
"x-kong-upstream-latency": .response.response."x-kong-upstream-latency"
}
.route = {
"https_redirect_status_code": .route."https_redirect_status_code",
"methods": .route.methods,
"name" : .route.name,
"paths" : .route.paths,
}
.service = {
"name": .service.name,
"tags": .service.tags
}
'''
See how we first parse the entire .message
field (which is where the original message is stored after being received by Vector) and create a new .timestamp
field which is Kong’s started_at
but in seconds, floored and transformed to Vector’s timestamp object.
Then we rewrite the .request.headers
object by only including those headers we want, deleting every trace of potentially compromising data (as api key). We continue doing the same with other fields.
This creates a smaller object with the proper timestamp.
Still, it’s not something we want to throw at ClickHouse. (We could, and then use materialized views to normalize data into a new table but this way is safer and we’ll get to why.)
So, we will apply a new transformation that takes the result of the previous one and flattens the object into a more table-ish fashion so it can be stored in ClickHouse.
Here’s our second transformation:
[transforms.kong_logs_clickhouse_ready]
type = "remap"
inputs = ["kong_logs_shortened"]
source = '''
ts, _ = format_timestamp(.timestamp, "%F %T")
. = {
"timestamp": ts,
"consumer_username": .request.headers."x-consumer-username",
"request_method": .request.method,
"request_size": .request.size,
"request_uri": .request.uri,
"request_url": .request.url,
"service_name": .service.name,
"response_status": .response.status,
"latency_kong": .latencies.kong,
"latency_proxy": .latencies.proxy,
"latency_request": .latencies.request
}
'''
If you paid attention, you can now see that the resulting object matches the definition of our ClickHouse table (see above).
Now I want to show you why I prefer to transform the object in Vector rather than using a materialized view to normalize data.
Vector has an amazing feature which totally convinced me to go for it which is unit tests.
Once your transformation pipelines become complex, you want to know that you are not breaking anything when you apply changes (those using Logstash know what I’m talking about).
This is why we have put two tests in place. One that will test only the first transformation and another that will test the first and the second.
This is how tests look like:
[[tests]]
## This tests the first transformation which takes Kong's raw log
name = "Kong logs shortened"
[[tests.inputs]]
insert_at = "kong_logs_shortened"
type = "log"
log_fields.message = '{"consumer":{"created_at":1615809852,"id":"4b1b701d-de2b-5588-9aa2-3b97061d9f52","username":"my-user"},"latencies":{"request":3,"kong":2,"proxy":1},"service":{"host":"nginx","created_at":1615809852,"connect_timeout":60000,"id":"0855b320-0dd2-547d-891d-601e9b38647f","protocol":"http","name":"my-service","read_timeout":60000,"port":80,"updated_at":1615809852,"write_timeout":60000,"retries":5,"ws_id":"0dc6f45b-8f8d-40d2-a504-473544ee190b","tags":["mytags"]},"request":{"querystring":{},"size":100,"uri":"/mypath","url":"http://localhost:8000/mypath","headers":{"host":"localhost:8000","x-consumer-username":"my-user","user-agent":"curl/7.68.0","accept":"*/*","x-consumer-id":"4b1b701d-de2b-5588-9aa2-3b97061d9f52","x-credential-identifier":"487ab43c-b2c9-51ec-8da5-367586ea2b61","apikey":"my-key"},"method":"GET"},"client_ip":"172.20.0.1","tries":[{"balancer_latency":0,"port":80,"balancer_start":1615809924112,"ip":"172.20.0.3"}],"authenticated_entity":{"id":"487ab43c-b2c9-51ec-8da5-367586ea2b61"},"upstream_uri":"/","response":{"headers":{"content-type":"text/html; charset=UTF-8","date":"Mon, 15 Mar 2021 12:05:24 GMT","connection":"close","via":"kong/2.3.3","content-length":"612","x-kong-proxy-latency":"2","server":"nginx/1.19.8","x-kong-upstream-latency":"1","accept-ranges":"bytes","etag":"\"60479ce6-264\"","last-modified":"Tue, 09 Mar 2021 16:05:58 GMT"},"status":200,"size":935},"route":{"strip_path":true,"methods":["GET"],"ws_id":"0dc6f45b-8f8d-40d2-a504-473544ee190b","request_buffering":true,"service":{"id":"0855b320-0dd2-547d-891d-601e9b38647f"},"name":"my-route","path_handling":"v0","id":"481a9539-f49c-51b6-b2e2-fe99ee68866c","preserve_host":false,"regex_priority":0,"paths":["/mypath"],"response_buffering":true,"updated_at":1615809852,"https_redirect_status_code":426,"protocols":["http","https"],"created_at":1615809852},"started_at":1615809924110}'
[[tests.outputs]]
extract_from = "kong_logs_shortened"
[[tests.outputs.conditions]]
type = "remap"
source = '''
## Make sure time elements are OK
.started_at == 1614232668342
.timestamp == to_timestamp!("2021-02-25T05:57:48Z")
## Test entire payload after transformation
encode_json(.) == "{\"authenticated_entity\":{\"id\":\"487ab43c-b2c9-51ec-8da5-367586ea2b61\"},\"client_ip\":\"172.20.0.1\",\"consumer\":{\"created_at\":1615809852,\"id\":\"4b1b701d-de2b-5588-9aa2-3b97061d9f52\",\"username\":\"my-user\"},\"latencies\":{\"kong\":2,\"proxy\":1,\"request\":3},\"request\":{\"headers\":{\"host\":\"localhost:8000\",\"user-agent\":\"curl/7.68.0\",\"x-consumer-id\":\"4b1b701d-de2b-5588-9aa2-3b97061d9f52\",\"x-consumer-username\":\"my-user\",\"x-forwarded-for\":null},\"method\":\"GET\",\"querystring\":{},\"size\":100,\"uri\":\"/mypath\",\"url\":\"http://localhost:8000/mypath\"},\"response\":{\"headers\":{\"content-length\":null,\"content-type\":null,\"x-kong-proxy-latency\":null,\"x-kong-upstream-latency\":null},\"size\":935,\"status\":200},\"route\":{\"https_redirect_status_code\":426,\"methods\":[\"GET\"],\"name\":\"my-route\",\"paths\":[\"/mypath\"]},\"service\":{\"name\":\"my-service\",\"tags\":[\"mytags\"]},\"started_at\":1615809924110,\"timestamp\":\"2021-03-15T12:05:24Z\",\"tries\":[{\"balancer_latency\":0,\"balancer_start\":1615809924112,\"ip\":\"172.20.0.3\",\"port\":80}],\"upstream_uri\":\"/\"}"
'''
## This tests the first and the second transformations. The output is the log object ready
## to be inserted into Clickhouse
name = "Kong logs ClickHouse ready"
[[tests.inputs]]
insert_at = "kong_logs_shortened"
type = "log"
log_fields.message = '{"consumer":{"created_at":1615809852,"id":"4b1b701d-de2b-5588-9aa2-3b97061d9f52","username":"my-user"},"latencies":{"request":3,"kong":2,"proxy":1},"service":{"host":"nginx","created_at":1615809852,"connect_timeout":60000,"id":"0855b320-0dd2-547d-891d-601e9b38647f","protocol":"http","name":"my-service","read_timeout":60000,"port":80,"updated_at":1615809852,"write_timeout":60000,"retries":5,"ws_id":"0dc6f45b-8f8d-40d2-a504-473544ee190b","tags":["mytags"]},"request":{"querystring":{},"size":100,"uri":"/mypath","url":"http://localhost:8000/mypath","headers":{"host":"localhost:8000","x-consumer-username":"my-user","user-agent":"curl/7.68.0","accept":"*/*","x-consumer-id":"4b1b701d-de2b-5588-9aa2-3b97061d9f52","x-credential-identifier":"487ab43c-b2c9-51ec-8da5-367586ea2b61","apikey":"my-key"},"method":"GET"},"client_ip":"172.20.0.1","tries":[{"balancer_latency":0,"port":80,"balancer_start":1615809924112,"ip":"172.20.0.3"}],"authenticated_entity":{"id":"487ab43c-b2c9-51ec-8da5-367586ea2b61"},"upstream_uri":"/","response":{"headers":{"content-type":"text/html; charset=UTF-8","date":"Mon, 15 Mar 2021 12:05:24 GMT","connection":"close","via":"kong/2.3.3","content-length":"612","x-kong-proxy-latency":"2","server":"nginx/1.19.8","x-kong-upstream-latency":"1","accept-ranges":"bytes","etag":"\"60479ce6-264\"","last-modified":"Tue, 09 Mar 2021 16:05:58 GMT"},"status":200,"size":935},"route":{"strip_path":true,"methods":["GET"],"ws_id":"0dc6f45b-8f8d-40d2-a504-473544ee190b","request_buffering":true,"service":{"id":"0855b320-0dd2-547d-891d-601e9b38647f"},"name":"my-route","path_handling":"v0","id":"481a9539-f49c-51b6-b2e2-fe99ee68866c","preserve_host":false,"regex_priority":0,"paths":["/mypath"],"response_buffering":true,"updated_at":1615809852,"https_redirect_status_code":426,"protocols":["http","https"],"created_at":1615809852},"started_at":1615809924110}'
[[tests.outputs]]
extract_from = "kong_logs_clickhouse_ready"
[[tests.outputs.conditions]]
type = "remap"
source = '''
## Test entire payload after transformation
encode_json(.) == "{\"consumer_username\":\"my-user\",\"latency_kong\":2,\"latency_proxy\":1,\"latency_request\":3,\"request_method\":\"GET\",\"request_size\":100,\"request_uri\":\"/mypath\",\"request_url\":\"http://localhost:8000/mypath\",\"response_status\":200,\"service_name\":\"my-service\",\"timestamp\":\"2021-03-15 12:05:24\"}"
'''
Isn’t this beautiful?
We want to know that our transformations are fine so we execute the tests and:
docker run -v `pwd`/docker/vector/:/etc/vector/ timberio/vector:0.12.1-alpine test /etc/vector/*.toml
Running tests
test Kong logs shortened ... passed
test Kong logs ClickHouse ready ... passed
If, by any chance, we broke the tests, we would see something like this (I simply modified the last number 4 into a 5 to make it fail):
docker run -v `pwd`/docker/vector/:/etc/vector/ timberio/vector:0.12.1-alpine test /etc/vector/*.toml
Running tests
test Kong logs shortened ... passed
test Kong logs ClickHouse ready ... failed
failures:
test Kong logs ClickHouse ready:
check transform 'kong_logs_clickhouse_ready' failed conditions:
condition[0]: source execution resolved to false
payloads (events encoded as JSON):
input: {"authenticated_entity":{"id":"487ab43c-b2c9-51ec-8da5-367586ea2b61"},"client_ip":"172.20.0.1","consumer":{"created_at":1615809852,"id":"4b1b701d-de2b-5588-9aa2-3b97061d9f52","username":"my-user"},"latencies":{"kong":2,"proxy":1,"request":3},"request":{"headers":{"host":"localhost:8000","user-agent":"curl/7.68.0","x-consumer-id":"4b1b701d-de2b-5588-9aa2-3b97061d9f52","x-consumer-username":"my-user","x-forwarded-for":null},"method":"GET","querystring":{},"size":100,"uri":"/mypath","url":"http://localhost:8000/mypath"},"response":{"headers":{"content-length":null,"content-type":null,"x-kong-proxy-latency":null,"x-kong-upstream-latency":null},"size":935,"status":200},"route":{"https_redirect_status_code":426,"methods":["GET"],"name":"my-route","paths":["/mypath"]},"service":{"name":"my-service","tags":["mytags"]},"started_at":1615809924110,"timestamp":"2021-03-15T12:05:24Z","tries":[{"balancer_latency":0,"balancer_start":1615809924112,"ip":"172.20.0.3","port":80}],"upstream_uri":"/"}
output: {"consumer_username":"my-user","latency_kong":2,"latency_proxy":1,"latency_request":3,"request_method":"GET","request_size":100,"request_uri":"/mypath","request_url":"http://localhost:8000/mypath","response_status":200,"service_name":"my-service","timestamp":"2021-03-15 12:05:24"}
But this is not the case and everything is correct.
We then set the ClickHouse sink module as follows:
[sinks.kong_logs_clickhouse]
type = "clickhouse"
inputs = ["kong_logs_clickhouse_ready"]
compression = "gzip"
database = "kong"
endpoint = "http://clickhouse:8123"
table = "log"
Please note that ClickHouse sink module offers many configuration parameters, among which those related to buffer capacity.
Beautiful, let’s see if it all works.
We will use Vegeta to do requests at 1,000 rps rate during 15 seconds and see if Vector’s buffering worked and the data was stored correctly.
"echo 'GET http://kong:8000/mypath' | vegeta attack -rate=1000 -duration=15s -header=apikey:my-key | vegeta report"
Requests [total, rate, throughput] 15000, 1000.06, 999.88
Duration [total, attack, wait] 15.002s, 14.999s, 2.762ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.202ms, 11.348ms, 3.355ms, 4.538ms, 5.508ms, 335.361ms, 466.81ms
Bytes In [total, mean] 9180000, 612.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:15000
Error Set:
(see how we send the API key using -header=apikey:my-key
.)
1,000 requests per second during 15 seconds means 15,000 events that should be now stored in ClickHouse. Let’s check it out:
docker-compose exec clickhouse bash -c "echo 'SELECT consumer_username AS u, count() c, quantile(0.95)(latency_kong) AS lk_p95, quantile(0.95)(latency_proxy) AS lp_p95 FROM kong.log GROUP BY u FORMAT PrettyCompactMonoBlock' | clickhouse-client -m -n"
┌─u───────┬─────c─┬─lk_p95─┬─lp_p95─┐
│ my-user │ 15000 │ 4 │ 1 │
└─────────┴───────┴────────┴────────┘
We didn’t lose a single event and life is good.
Closing notes
We now know how to store data coming from Kong into ClickHouse. If, by any chance, you preferred Elasticsearch as a storage solution, you could simply use the Elasticsearch sink module and you’d be fine (or you could use both!).
Vector is powerful and, from what we can see, you can expect a vibrant community and a number of new features coming soon.
Again, if you want to run this demo by yourself, you can just clone this repo https://github.com/pachico/demo-kong-vector-clickhouse.